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Fore Word 


It is my great pleasure to write the foreword for this excellent and timely book. 
Games have long been seen as the perfect test-bed for artificial intelligence (AI) 
methods, and are also becoming an increasingly important application area. Game 
AI is a broad field, covering everything from the challenge of making super-human 
AI for difficult games such as Go or StarCraft, to Creative applications such as the 
automated generation of novel games. 

Game AI is as old as AI itself, but over the last decade the field has seen mas- 
sive expansion and enrichment with the inclusion of video games, which now com- 
prise more than 50% of all published work in the area and enable us to address a 
broader range of challenges that have great commercial, social, economic and scien- 
tific interest. A great surge in research output occurred in 2005, coinciding with both 
the first IEEE Symposium (Conference) on Computational Intelligence and Games 
(CIG)—which I co-chaired with Graham Kendall—and the first AAAI AUDE Con¬ 
ference (Artificial Intelligence in Digital Entertainment). Since then this rich area of 
research has been more explored and better understood. The Game AI community 
pioneered much of the research which is now becoming (or about to become) more 
mainstream AI, such as Monte Carlo Tree Search, procedural content generation, 
playing games based on screen capture, and automated game design. 

Over the last decade, progress in deep learning has had a profound and transfor- 
mational effect on many difficult problems, including speech recognition, machine 
translation, natural language understanding and computer vision. As a resuit, com- 
puters can now achieve human-competitive performance in a wide range of percep- 
tion and recognition tasks. Many of these systems are now available to the program- 
mer via a range of so-called cognitive Services. More recently, deep reinforcement 
learning has achieved ground-breaking success in a number of difficult challenges, 
including Go and the amazing feat of learning to play games directly from screen 
capture (playing from pixels). It is fascinating to contemplate what this could mean 
for games as we stumble towards human-level intelligence in an increasing number 
of areas. The impacts will be significant for the intelligence of in-game characters, 
the way in which we interact with them and for the way games are designed and 
tested. 
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This book makes an enormous contribution to this captivating, vibrant area of 
study: an area that is developing rapidly both in breadth and depth as AI is able to 
cope with a wider range of tasks (and to perform those tasks to increasing levels of 
excellence). The Service to the community will be felt for many years to come; the 
book provides an easier and more comprehensive entry point for newcomers to the 
field than previously available, whilst also providing an indispensable reference for 
existing AI and Games researchers wishing to learn about topics outside their direct 
field of interest. 

Georgios Yannakakis and Julian Togelius have been involved with the field ever 
since its widespread expansion to video games, and they both presented papers at 
the first 2005 CIG. Over the years they have made an enormous contribution to the 
field with a great number of highly cited papers presenting both novel research and 
comprehensive surveys. It is my opinion that these authors are best qualified to write 
this book, and they do not disappoint. The book will serve the community very well 
for many years to come. 

London, August 2017 Simon Lucas 



Preface 


Of all the things that wisdom provides for the complete happiness ofone’s entire life, byfar 
the greatest is friendship. 


Epicurus, Principal Doctrines, 27 

Human beings, viewed as behaving Systems, are quite simple. The apparent complexity of 
our behavior over time is largely a reflection of the complexity of the environment in which 
wefind ourselves. 


Herbert A. Simon 

It would be an understatement to say that Artiflcial Intelligence (AI) is a popular 
topic at the moment, and it is unlikely to become any less important in the future. 
More researchers than ever work on AI in some form, and more non-researchers than 
ever are interested in the field. It would also be an understatement to say that games 
are a popular application area for AI research. While board games have been Central 
to AI research since the inception of the field, video games have during the last 
decade increasingly become the domain of choice for testing and showcasing new 
algorithms. At the same time, video games themselves have become more diverse 
and sophisticated, and some of them incorporate advances in AI for controlling 
non-player characters, generating content or adapting to players. Game developers 
have increasingly realized the power of AI methods to analyze large volumes of 
player data and optimize game designs. And a small but growing community of 
researchers and designers experiment with ways of using AI to design and create 
complete games, automatically or in dialog with humans. It is indeed an exciting 
time to be working on AI and games! 

This is a book about AI and games. As far as we know, it is the first compre¬ 
hensive textbook covering the field. With comprehensive, we mean that it features 
all the major application areas of AI methods within games: game-playing, con¬ 
tent generation and player modeling. We also mean that it discusses AI problems 
in many different types of games, including board games and video games of many 
genres. The book is also comprehensive in that it takes multiple perspectives of AI 
and games: how games can be used to test and develop AI, how AI can be used 
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to make games better and easier to develop, and to understand players and design. 
While this is an academic book which is primarily aimed at students and researchers, 
we will frequently address problems and methods relevant for game designers and 
developers. 

We wrote this book based on our long experience doing research on AI for games, 
each on our own and together, and helping lead and shape the research community. 
We both independently started researching AI methods in games in 2004, and we 
have been working together since 2009. Together, we played a role in introducing 
research topics such as procedural content generation and player modeling to the 
academic research community, and created several of the most widely used game- 
based AI benchmarks. This book is in a sense a natural outgrowth of the classes on 
AI and games we have taught at three universities, and the several survey papers of 
the field and of individual research topics within it that we have published over the 
years. But the book is also a response to the lack of a good introductory book for the 
research field. Early discussions on writing such a book date back at least a decade, 
but no-one actually wrote one, until now. 

It could be useful to point out what this book is not. It is not a hands-on book 
with step-by-step instructions on how to build AI for your game. It does not feature 
discussions of any particular game engine or Software framework, and it does not 
discuss Software engineering aspects or many implementation aspects at all. It is not 
an introductory book, and it does not give a gentle introduction to basic AI or game 
design concepts. For all these roles, there are better books available. 

Instead, this is a book for readers who already understand AI methods and con¬ 
cepts to the level of having taken an introductory AI course, and the introductory 
computer Science or engineering courses that led up to that course. The book as¬ 
sumes that the reader is comfortable with reading a pseudocode description of an 
algorithm and implementing it. Chapterj^is a summary of AI methods used in the 
book, but is intended more as a reference and refresher than as an introduction. The 
book also assumes a basic familiarity with games, if not designing them then at least 
playing them. 

The use case for this textbook that we had in mind when writing it is for a one- 
or a two-semester graduate-level or advanced undergraduate level class. This can 
take several different shapes to support different pedagogical practices. One way of 
teaching such a class would be a traditional class, with lectures covering the chapters 
of the book in order, a conventional pen-and-paper exam at the end, and a small 
handful of programming exercises. For your convenience, each of the main chapters 
of the book include suggestions for such exercises. Another way of organizing a 
class around this book, more in line with how we personally prefer to teach such 
courses, is to teach the course material during the hrst half of the semester and 
spend the second half on a group project. 

The material offered by this book can be used in various ways and, thus, support 
a number of different classes. In our experience, a traditional two-semester class on 
game artificial intelligence would normally cover Chapterj^and Chapterj^in the hrst 
semester and then focus on alternative uses of AI in games (Chapters]^ andin the 
second semester. When teaching the material in compressed (one-semester) fash- 
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ion instead, it is advisable to skip Chapter|^(using it as a reference when needed), 
and focus the majority of the lectures on Chapters and Chapters and [7] 
can be used as material for inspiring advanced graduate-level projects in the area. 
Beyond the striet limits of game AI, Chapter[^(or sections of it) can complement 
classes with a focus on game design or computational creativity whereas Chapter]^ 
can complement classes with a focus on affective computing, user experience, and 
data mining. It is of course also possible to use this book for an introductory under- 
graduate class for students who have not taken an AI class before, but in that case 
we advise the instructor to select a small subset of topies to focus on, and to com¬ 
plement the book with Online tutorials on specihc methods (e.g., best-first search, 
evolutionary computation) that introduce these topies in a more gentle fashion than 
this book does. 


Chania, Crete, Greece Georgios N. Yannakakis 

New York, NY, USA Julian Togelius 

September 2017 
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Chapter 1 

Introduction 


Artiflcial Intelligence (AI) has seen immense progress in recent years. It is both a 
thriving research field featuring an increasing number of important research areas 
and a core technology for an increasing number of application areas. In addition 
to algorithmic innovations, the rapid progress in AI is often attributed to increas¬ 
ing computational power due to hardware advancements. The success stories of AI 
can be experienced in our daily lives through its many practical applications. AI ad- 
vances have enabled better understanding of images and speech, emotion detection, 
self-driving cars, web searching, Al-assisted Creative design, and game-playing, 
among many other tasks; for some of these tasks machines have reached human- 
level status or beyond. 

There is, however, a difference between what machines can do well and what 
humans are good at. In the early days of AI, researchers envisaged computational 
Systems that would exhibit aspects of human intelligence and achieve human-level 
problem solving or decision making skills. These problems were presented to the 
machines as a set of formal mathematical notions within rather narrow and con- 
trolled spaces, which could be solved by some form of Symbol manipulation or 
search in symbolic space. The highly formalized, symbolic representation allowed 
AI to succeed in many cases. Naturally, games—especially board games —have 
been a popular domain for early AI attempts as they are formal and highly con- 
strained, yet complex, decision making environments. 

Over the years the focus of much AI research has shifted to tasks that are rela- 
tively simple for humans to do but are hard for us to describe how to do, such as 
remembering a face or recognizing our friend’s voice over the phone. As a resuit, AI 
researchers began to ask questions such as; How can AI detect and express emotion ? 
How can AI educate people, be Creative or artistically novel? How can AI play a 
game it has not seen before? How can AI learn from a minimal number of trials? 
How can AI feel guilt? All these questions pose serious challenges to AI and cor- 
respond to tasks that are not easy for us to formalize or define objectively. Perhaps 
surprisingly (or unsurprisingly after the fact), tasks that require relatively low cog¬ 
nitive effort from us often turn out to be much harder for machines to tackle. Again, 
games have provided a popular domain to investigate such abilities as they feature 

© Springer International Publishing AG, part of Springer Nature 2018 3 
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Chapter 1. Introduction 


aspects of a subjective nature that cannot be formalized easily. These include, for 
instance, the experience of play or the Creative process of game design II599L 
Ever since the birth of the idea of artificial intelligence, games have been help- 
ing AI research progress. Games not only pose interesting and complex problems 
for AI to solve—e.g., playing a game well; they also offer a canvas for creativity 
and expression which is experienced by users (people or even machines!). Thus, 
arguably, games are a rare domain where Science (problem solving) meets art and 
interaction: these ingredients have made games a unique and favori te domain for the 
study of AI. But it is not only AI that is advanced through games; games have also 
been advanced through AI research. We argue that AI has been helping games to get 
better on several fronts: in the way we play them, in the way we understand their 
inner functionalities, in the way we design them, and in the way we understand play, 
interaction and creativity. This book is dedicated to all aspects of the intersection of 
games and AI and the numerous ways both games and AI have been challenged, but 
nevertheless, advanced through this relationship. It is a book about AI for games and 
games for AI. 


1.1 This Book 

The study of AI in and for games is what this book defines as the research field of 
game artificial intelligence (in brief game AI, also occasionally referred to as AI 
and games). The book offers an academic perspective of game AI and serves as 
a comprehensive guidebook for this exciting and fast-moving research held. Game 
AI—in particular video game or computer game AI—^has seen major advancements 
in the (roughly) hfteen years of its existence as a separate research held. During this 
time, the held has seen the establishment and growth of important yearly meetings— 
including the IEEE Conference on Computational Intelligence and Games (CIG) 
and the AAAI Artihcial Intelligence and Interactive Digital Entertainment (AUDE) 
conference series—as well as the launch of the IEEE Transactions ON COM¬ 
PUTATIONAL Intelligence and AI in Games (TCIAIG) journal—which will 
be renamed IEEE TRANSACTIONS ON Games (ToG) from January 2018. Since 
the early days of game AI we have seen numerous success stories within the sev¬ 
eral subareas of this growing and thriving research held. We can nowadays use AI 
to play many games better than any human, we can design AI bots that are more 
believable and human-like than human players, we can collaborate with AI to de¬ 
sign better and unconventional (aspects of) games, we can better understand players 
and play by modeling the overall game experience, we can better understand game 
design by modeling it as an algorithm, and we can improve game design and tune 
our monetization strategy by analyzing massive amounts of player data. This book 
builds on these success stories and the algorithms that took us there by exploring 
the different uses of AI for games and games for AI. 
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1.1.1 Why Did We Write This Book? 

Both of us have been teaching and researching game artificial intelligence at under- 
graduate and graduate levels in various research and educational institutions across 
the globe for over a decade. Both of us have felt, at times, that a comprehensive 
textbook on game AI was necessary for our students and a Service to the learning 
objectives of our programs. Meanwhile, an increasing number of fellow academics 
felt the same way. Such a book was not available, and given our extensive experi- 
ence with the field, we felt we were well placed to write the book we needed. Given 
that we have been collaborating on game AI research since 2009, and known each 
other since 2005, we knew our perspective was coherent enough to actually agree 
on what should go into the book without undue bickering. While we have been try- 
ing hard to write a book that will appeal to many and be useful for both students 
and researchers from different backgrounds, it ultimately reflects our perspective of 
what game AI is and what is important within the held. 

Looking at the existing literature to allocate readings for a potential course on 
game AI one can rely partly on a small number of relevant and recent surveys 
and Vision papers for specific game AI research areas. Examples include papers 
meant to serve as general introductions to game AI 04071 17641 17851 . general game 
AI 07181 . Monte-Carlo Tree Search 0771 . procedural content generation 0783117201 . 
player modeling 07821 . emotion in games 07811 . computational narrati ve 05621 . AI 
for game production 05640 . neuroevolution in games 05671 . and AI for games on 
mobile devices 02651 . There are also some earlier surveys reflecting the state of the 
art at the beginning of this research field, for example, on evolutionary computa- 
tion in games 04060 and computational intelligence in games 04050 . No mere paper 
can however on its own cover the breadth and depth required by a full course on 
game AI. For this reason, the courses we have taught have generally been structured 
around a set of papers, some of them surveys and some of them primary research 
papers, together with slides and course notes. 

The hrst, recently published, edited volumes on game AI research have been 
great assets for teaching needs in game AI. These are books with a focus on a 
particular area of game AI research such as procedural content generation 06161 . 
emotion in games 03251 and game data mining 01861 . Because of their more nar- 
row domains they cannot serve as textbooks for a complete course on game AI but 
rather as parts of a game AI course or, alternatively, as textbooks for independent 
courses on procedural content generation, affective computing or game data mining, 
for instance. 

Meanwhile several edited volumes or monographs which have covered aspects 
of game AI programming are edited or written by game AI experts from the game 
industry. These include the popular game AI programming wisdom series 0546115471 
I5481I549I and other game AI programming volumes 06041 l8l 155^155311^ 1^14251 . 
These books, however, target primarily professional or indie developers, game AI 
programmers and practitioners, and do not always fulhll the requirements of an aca- 
demic textbook. As you will see, this is only one part of the field of game AI as we 
define it. Further, some of the earlier books are somewhat outdated by now given the 
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fast pace of progress of the game AI research field 01091 l62l . Among the industry- 
focused game AI books there are a few that aim to target educators and students 
of game AI. They have a rather narrow scope, however, as they are limited to non- 
player character AI m, which is arguably the most important topic for game AI 
practitioners in the gaming industry 076411425 II but only one of several research ar¬ 
eas in academic game AI research 178511 . In our terminology, the perspective of these 
industry-focused textbooks tends to be almost exclusively on what we call playing 
for experience, in particular generating interesting non-player character (NPC) be- 
havior that looks lifelike and functions within the confines of a game design. Finally 
there are game AI books that are tied to a particular language or Software suite such 
as Lua fm\ or Unity |l3ll, which also limits their usefulness as general textbooks. 

In contrast to the above list of books, edited volumes and papers, this book aims 
to present the research field as a whole and serve (a) as a comprehensive textbook 
for game artificial intelligence, (b) as a guidebook for game AI programming, and 
(c) as a fleld guide for researchers and graduate students seeking to orient them- 
selves within this multifaceted research field. For this reason, we both detail the 
state of knowledge of the field and also present research and original scholarship in 
game AI. Thus the book can be used for both research-based teaching and hands- 
on applications of game AI. We detail our envisaged target audience in the section 
below. 


1.1.2 Who Should Read This Book? 

With this book we hope to reach readers with a general interest in the application of 
AI to games, who already know at least the basies of artificial intelligence. However, 
while writing the book we particularly envisioned three groups of people benefiting 
directly from this book. The first group is university students, of graduate or ad- 
vanced undergraduate level, who wish to learn about AI in games and use it to de- 
velop their career in game AI programming or in game AI research. In particular, we 
see this book being used in advanced courses for students who have already taken 
an introductory AI course, but with care and some supplementary material it could 
be used for an introductory course as well. The second group is AI researchers and 
educators who want to use this book to inspire their research or, instead, use it as 
a textbook for a class in artificial intelligence and games. We particularly think of 
active researchers within some Al-related field wanting to start doing research in 
game AI, and new Ph.D. students in the area. The last target audience is computer 
game programmers and practitioners who have limited AI or machine leaming 
background and wish to explore the various Creative uses of AI in their game or 
Software application. Here we provide a complement to the more industry-focused 
books listed above by taking a broader view of what AI in and for games could be. 
For further fostering the learning process and widening the practical application of 
AI in games the book is accompanied by a website that features lectures, exercises 
and additional resources such as readings and tools. 
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This book is written with the assumption that its readers come from a techni- 
cal background such as computer Science, Software engineering or applied math. 
We assume that our readers have taken courses on the fundamentals of artificial in- 
telligence (or acquired this knowledge elsewhere) as the book does not cover the 
algorithms in detail; our focus, instead, is on the use of the algorithms in games and 
their modification for that purpose. To be more specific, we assume that the reader 
is familiar with core concepts in tree search, optimization, supervised learning, un- 
supervised learning and reinforcement learning, and has implemented some basic 
algorithms from these categories. Chapter provides an overview of core meth- 
ods for game AI and a refresher for the reader whose knowledge is a bit rusty. We 
also assume familiarity with programming and a basic understanding of algebra and 
calculus. 


1.1.3 A Short Note on Terminology 

The term “artificial and computational intelligence in games” is often used to re¬ 
fer to the entire field covered in this book (e.g., see the title of HTSSIU . This reflects 
the dual roots of the field in artificial intelligence and computational intelligence 
(CI) research, and the use of these terms in the names of the major conferences in 
the field (AUDE and CIG) and the flagship Journal (IEEE TCIAIG) explicitly tar- 
geting both CI and AI research. There is no agreement on the exact meaning of 
the terms AI and CI. Historically, AI has been associated with logic-based or sym- 
bolic methods such as reasoning, knowledge representation and planning, and CI 
has been associated with biologically-inspired or statistical methods such as neural 
networks (including what is now known as deep learning) and evolutionary compu- 
tation. However, there is considerable overlap and strong similarities between these 
fields. Most of the methods proposed in both fields aim to make computers perform 
tasks that have at some point been considered to require intelligence to perform, and 
most of the methods include some form of heuristic search. The field of machine 
learning intersects with both AI and CI, and many techniques could be said to be 
part of either field. 

In the rest of the book we will use the terms “AI and games”, “AI in games” 
and “game AI” to refer to the whole research field, including those approaches that 
originally come from the CI and machine learning fields. There are three reasons for 
this: simplicity, readability, and that we think that the distinction between CI and AI 
is not usefui for the purposes of this book or indeed the research field it describes. 
Our use of these terms is not intended to express any prejudice towards particular 
methods or research questions. (Eor a non-exclusive list of methods we believe are 
part of “AI” according to this definition, see Chapter]^) 
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1.2 A Brief History of Artifidal Intelligence and Games 

Games and artificial intelligence have a long history together. Much research on AI 
for games is concerned with constructing agents for playing games, with or without 
a leaming component. Historically, this has been the first and, for a long time, the 
only approach to using AI in games. Even since before artificial intelligence was 
recognized as a field, early pioneers of computer Science wrote game-playing pro- 
grams because they wanted to test whether computers could solve tasks that seemed 
to require “intelligence”. Alan Turing, arguably the principal inventor of computer 
Science, (re)invented the Minimax algorithm and used it to play Chess M725I . The 
first Software that managed to master a game was programmed by A. S. Douglas 
in 1952 on a digital version of the Tic-Tac-Toe game and as part of his doctoral 
dissertation at Cambridge. A few years later, Arthur Samuel was the first to invent 
the form of machine learning that is now called reinforcement learning using a 
program that learned to play Checkers by playing against itself M591L 

Most early research on game-playing AI was focused on classic board games, 
such as Checkers and Chess. There was a conception that these games, where great 
complexity can arise from simple rules and which had challenged the best human 
minds for hundreds or even thousands of years, somehow captured the essence of 
thought. After over three decades of research on tree search, in 1994, the Chinook 
Checkers player managed to beat the World Checkers Champion Marion Tinsley 
115941 : the game was eventually solved in 2007 M593I . For decades Chess was seen as 
“the drosophila of AI” in the sense of being the “model organism” that uncountable 
new AI methods were tested on 11941 — at least until we developed Software capable 
of playing better than humans, at which point Chess-playing AI somehow seemed a 
less urgent problem. The Software that first exhibited superhuman Chess capability, 
IBM’s Deep Blue, consisted of a Minimax algorithm with numerous Chess-specific 
modifications and a very highly tuned board evaluation function running on a cus- 
tom supercomputer ll98l 12851 . Deep Blue famously won against the reigning grand- 
master of Chess, Garry Kasparov, in a much-publicized event back in 1997. Twenty 
years later, it is possible to download public domain Software that will play better 
than any human player when running on a regular laptop. 

A milestone in AI research in games a few years before the successes of Deep 
Blue and Chinook is the backgammon Software named TD-Gammon which was 
developed by Gerald Tesauro in 1992. TD-Gammon employs an artificial neural 
network which is trained via temporal difference learning by playing backgam¬ 
mon against itself a few million times II688116891 . TD-Gammon managed to play 
backgammon at a level of a top human backgammon player. After Deep Blue IBM’s 
next success story was Watson, a Software system capable of answering questions 
addressed in natural language. In 2011, Watson competed on the Jeopardy! TV 
game and won $1 million against former winners of the game II201L 

Following the successes of AI in traditional board games the latest board game 
AI milestone was reached in 2016 in the game of Go. Soon after Chinook and 
Deep Blue, the game of Go became the new benchmark for game playing AI with 
a branching factor that approximates 250 and a vast search space many times larger 
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than that of Chess’. While human level Go playing had been expected sometime in 
the far future 0681 . already in 2016 Lee Sedol—a 9-dan professional Go player— 
lost a five-game match against Google DeepMind’s AlphaGo Software which fea- 
tured a deep reinforcement leaming approach II629I . Just a few days before the re- 
lease of the first draft of this book—between 23 and 27 May 2017—AlphaGo won 
a three-game Go match against the world’s number 1 ranking player Ke Jie, running 
on a single computer. With this victory, Go was the last great classic board game 
where computers have attained super-human performance. While it is possible to 
construet classic-style board games that are harder than Go for computers to play, 
no such games are popular for human players. 

But classic board games, with their discrete turn-based mechanies and where the 
full state of the game is visible to both players, are not the only games in town, 
and there is more to intelligence than what classic board games can challenge. In 
the last decade and a half, a research community has therefore grown up around 
applying AI to games other than board games, in particular video games. A large part 
of the research in this community focuses on developing AI for playing games— 
either as effectively as possible, or in the style of humans (or a particular human), 
or with some other property. A notable milestone in video game AI playing was 
reached in 2014 when algorithms developed by Google DeepMind learned to play 
several games from the classic Atari 2600 video game console on a super-human 
skill level just from the raw pixel inputs 14641 . One of the Atari games that proved 
to be hard to play well with that approach was Ms Pac-Man (Namco, 1982). The 
game was practically solved a few days before the release of the second draft of the 
book (June 2017) by the Microsoft Maluuba team using a hybrid reward architecture 
reinforcement learning technique 07381 . 

Other uses of AI in video games (as detailed in this book) have come to be very 
important as well. One of these is procedural content generation. Starting in the 
early 1980s, certain video games created some of their content algorithmically dur- 
ing runtime, rather than having it designed by humans. Two games that became very 
influential early on are Rogue (Toy and Wichmann, 1980), where dungeons and the 
placement of creatures and items in them are generated every time a new game 
starts, and Elite (Acornsoft, 1984), which Stores a large universe as a set of random 
seeds and creates star systems as the game is played. The great promise of games 
that can generate some of their own content is that you can get more—potentially 
infinite—content without having to design it by hand, but it can also help reduce 
storage space demands among many other potential benefits. The influence of these 
games can be seen in recent successes such as Diablo III (Blizzard Entertainment, 
2012), No Man ’s Sky (Helio Games, 2016) and the Chalice Dungeons of Bloodborne 
(Sony Computer Entertainment, 2015). 

Relatively recently, AI has also begun to be used to analyze games, and model 
players of games. This is becoming increasingly important as game developers need 
to create games that can appeal to diverse audiences, and increasingly relevant as 
most games now benefit from internet connectivity and can “phone horne” to the 
developer’s servers. Eacebook games such as FannVille (Zynga, 2009) were among 
the first to benefit from continuous data collection, Al-supported analysis of the data 
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and semi-automatic adaptation of the game. Nowadays, games such as Nevennind 
(Flying Mollusk, 2016) can track the emotional changes of the player and adapt the 
game accordingly. 

Very recently, research on believable agents in games has opened new horizons 
in game AI. One way of conceptualizing believability is to make agents that can 
pass game-based Turing tests. A game Turing test is a variant of the Turing test 
in which a number of judges must correctly guess whether an observed playing 
behavior in a game is that of a human or an Al-controlled game bot II263116191 . 
Most notably, two Al-controlled bot entries managed to pass the game Turing test in 
Unreal Tournament 2004 (Epie Games, 2004) on Turing’s centenary in 2012 M603I . 

In the next section we will outline the parallel developments in both academia 
and industry and conclude the historical section on game AI with ways the two 
communities managed to exchange practices and transfer knowledge for a common 
two-fold goai; the advancement of AI and the improvement of games at large. 


1.2.1 Academia 

In academic game AI we distinguish two main domains and corresponding research 
activity; board games and video (or computer) games. Below, we outline the two 
domains in a chronological sequence, even though game AI research is highly active 
in both of them. 


1.2.1.1 Early Days on the Board 

When it comes to game AI research, classic board games such as Chess, Checkers 
and Go are clearly benehcial to work with as they are very simple to model in code 
and can be simulated extremely fast—one can easily make millions of moves per 
second on a modem computer—which is indispensable for many AI techniques. 
Also, board games seem to require thinking to play well, and have the property 
that they take “a minute to learn, but a lifetime to master”. It is indeed the case 
that games have a lot to do with learning, and good games are able to constantly 
teach us more about how to play them. Indeed, to some extent the fun in playing 
a game consists in learning it and when there is nothing more to learn we largely 
stop enjoying them M351I . This suggests that better-designed games are also better 
benchmarks for artificial intelligence. As mentioned above, board games were the 
dominant domain for AI research from the early 1950s until quite recently. As we 
will see in the other parts of this book—notably in Chapter[3] —^board games remain 
a popular game AI research domain even though the arrival of video and arcade 
games in the 1980s has shifted a large part of the focus since then. 
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1.2.1.2 The Digital Era 

To the best of our knowledge, the first video game conference occurred at Harvard’s 
Graduate School of Educatioi{i|in 1983. The core focus of the conference was on 
the educational benefits and positive social impact of video game playing. 

The birth date of the digital game AI field can be safely considered to be around 
year 2001. The seminal article by Laird and van Lent II360I . emphasizing the role 
of games as the killer appUcation for AI, established the foundations of game AI 
and inspired early work in the field |l693|235l|47g|292l 1211] |M1 ISSI In 
those early days AI in digital games was mainly concerned with playing games, 
agent architectures for NPC behavior iHon [Tool . sometimes within interactive 
drama 0438113991141211483111071 . and pathfinding 16641 . Early work in these areas 
was presented primarily in the AAAI Spring Symposia on AI and Interactive Enter- 
tainment preceding the AUDE (which started 2005) and the IEEE CIG (also started 
in 2005) conferences. Most of the early work in the game AI field was conducted by 
researchers with AI, optimization and control background and research experience 
in adaptive behavior, robotics and multi-agent systems. AI academics used the best 
of their computational intelligence and AI tools to enhance NPC behavior in gen- 
erally simple, research-focused, non-scalable projects of low commercial value and 
perspective. 


1.2.2 Industry 

The first released video games back in the 1970s included little or nothing that we 
would call artificial intelligence; NPC behaviors were scripted or relied on simple 
rules, partly because of the primitive state of AI research, but perhaps even more 
because of the primitive hardware of the time. However, in parallel to developments 
in academia, the game industry gradually made steps towards integrating more so- 
phisticated AI in their games during the early days of game AI 11109117581 . 

A non-exhaustive list of AI methods and game features that advanced the game 
AI state-of-practice in the industry 154611 in chronological order includes the first 
popular application of neural networks in Creatures (Millennium Interactive, 1996) 
with the aim to model the creatures’ behavior; the advanced sensory system of 
guards in Thief (EIDOS, 1998); the team tactics and believable combat scenes in the 
Halo series (Microsoft Studios, 2011-2017)— Halo 2 in particular popularized the 
use of behavior trees in games; the behavior-based AI of Blade Runner (Virgin Inter¬ 
active, 1997); the advanced opponent tactics in Half-Life (Valve, 1998); the fusion 
of machine leaming techniques such as perceptrons, decision trees and reinforce- 
ment learning coupled with the belief-desire-intention cognitive model in Black and 
White (EA, 2000)—see Eig. [Ml the believable agents of The Sims series (Electronic 
Arts, 2000-2017); the imitation learning Drivatar system of Forza Motorsport (MS 

* Fox Butterfield, Video Game Specialists Come to Harvard to Praise Pac-Man; Not to Bury Him. 
New York Times, May 24, 1983 
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Fig. 1.1 A screenshot from Black and White (EA, 2000), a highlight in game artificial intelligence 
history that successfully integrated several AI methods into its design. The game features a creature 
that leams through positive rewards and penalties in a reinforcement leaming fashion. Further, the 
creature employs the belief-desire-intention model (22^ for its decision making process during 
the game. The desires of the creature about particular goals are modeled via simple perceptrons. 
For each desire, the creature selects the belief that it has formed the best opinion about; opinions, 
in turn, are represented by decision trees. Image obtained from Wikipedia (fair use). 


Game Studios, 2005); the generation of context-sensitive behaviors via Goal Ori- 
ented Aetion Planning 15061 —a simplified STRIPS-like planning method—which 
was specifically designed for F.E.A.R. (Sierra Entertainment, 2005) 15071 : the pro- 
cedurally generated worlds of the Civilization series (MicroProse, Activision, Info- 
grames Entertainment, SA and 2K Games, 1991-2016) and Dwarf Fortress (Bay 12 
Games, 2006); the AI director of Left 4 Dead (Valve, 2008); the realistic gunfights of 
Red Dead Redemptiori (Rockstar Games, 2010); the personality-based adaptation in 
Silent Hili: Shattered Memories (Konami, 2010); the affect-based cinematographic 
representation of multiple cameras in Heavy Rain (Quantic Dream, 2010); the neu- 
roevolutionary training of platoons in Supreme Commander 2 (Square Enix, 2010); 
the buddy AI (named Ellie) in The Last ofUs (Sony Computer Entertainment, 2013); 
the companion character, Elizabeth, in BioShock Infinite (2K Games, 2013); the In¬ 
teractive narratives of Blood & Laureis (Emily Short, 2014); the alien’s adaptive 
behavior which adjusts its hunting strategy according to the player in Alien: Isola- 
tion (Sega, 2014); and the procedurally generated worlds of Spelunky (Mossmouth, 
LLC, 2013) and No Man ’s Sky (Helio Games, 2016). 
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The key criterion that distinguishes successful AI in commercial-standard games 
had always been the level of integration and interweaving of AI in the design of 
the game 0599115461 . An unsuccessfui coupling of game design and AI may lead 
to unjustifiable NPC behaviors, break the suspension of disbelief and immediately 
reduce player immersion. A typical example of such a mismatch between AI and 
design is the broken navigation of bots that get stuck in a leveFs dead end; in such 
instances either the level design is not (re)considered appropriately to match the 
design of AI or the AI is not tested sufficiently, or both. On the other hand, the 
successful integration of AI in the design process is likely to guarantee satisfactory 
outcomes for the playing experience. The character design process, for instance, 
may consider the limitations of the AI and, in turn, absorb potentiai “catastrophic” 
failures of it. An example of such an interwoven process is the character design 
in Fagade im that was driven, in part, by the limitations of the natural language 
Processing and the interactive narrative components of the game. 

It is important to note that this book is not necessarily about game AI as defined 
and practiced in the game industry. Instead, it is primarily an academic textbook 
that refers to some of the techniques that have been used in and popularized through 
the game industry—see for instance the ad-hoc behavior authoring section of Chap- 
ter|^ The reader with an interest in the AI state of practice in the game industry is 
referred to the several introductory articles (e.g., III7III369l l available in books such 
as the game AI programming wisdom series II5461154711548115491 . Another valu- 
able resource is the video recorded talks from top game AI programmers which 
are hosted at the AI summij^of the Game Developers Conference (GDC) and are 
available at the GDC Vaultl^inally, talks and courses mostly relevant for game AI 
programmers are available through the nuci.ai conference webpagej^ 


1.2.3 The “Gap” 

During the first decade of academic game AI research, whenever researchers from 
academia and developers from industry would meet and discuss their respective 
Work, they would arrive at the conclusion that there exists a gap between them; 
the gap had multiple facets such as differences in background knowledge, practice, 
trends, and state-of-the-art Solutions to important problems. Academics and practi- 
tioners would discuss ways to bridge that gap for their mutual benefit on a frequent 
basis II 1091 but that debate would persist for many years as developments on both 
ends were slow. The key message from academic AI was that the game industry 
should adopt a “high risk-high gain” business modei and attempt to use sophisti- 
cated AI techniques with high potentiai in their games. On the other end, the Central 
complaint of industriai game AI regarding game AI academics has been the lack of 


^ http://www.gdconf.coiTi/conference/ai.html 
^ http://www.gdcvault.com/ 

'* https://nucl.ai/ 
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domain-specific knowledge and practical wisdom when it comes to realistic prob- 
lems and challenges faced during game production. Perhaps above all there is a 
difference in what is valued, with academics valuing new algorithms and new uses 
of algorithms that achieve superior performance or create new phenomena or experi- 
ences, and AI developers in industry valuing Software architectures and algorithmic 
modifications that reliably support specific game designs. But what happened since 
then? Does this gap really exist nowadays or is it merely a ghost from the past? 

It is stili true that the academic game AI research community and the game indus¬ 
try AI development community largely work on different problems, using different 
methods. There are also some topics and methods explored by the academic commu¬ 
nity which are generally very unpopular within the game industry. Real-time adap- 
tation and learning in NPCs is one such example; quite a few academic researchers 
are excited by the idea of NPCs that can leam and develop from their interactions 
with the player and other NPCs in the game. However, AI developers in industry 
point out that it would be very hard to predict what these NPCs will leam, and it 
is very likely to “break the game” in the sense that it no longer works as designed. 
Conversely, there are methods and problems explored in industry which most aca¬ 
demics do not care about, as they only make sense within the complex Software 
architecture of a complete game. 

When thinking about the use of AI within modern video games, it is important to 
remember that most game genres have developed evolutionarily from earlier game 
designs. For example, the first platformers were released in the mid-1980s and the 
first first-person shooters and real-time strategy games in the early 1990s. At that 
time, the ability to build and deploy advanced AI was much less than it is today, 
so designers had to design around the lack of AI. These basic design patterns have 
largely been inherited by today’s games. It can therefore be said that many games 
have been designed not to need AI. For the academic who wants to build interesting 
AI for an in-game role, the best might therefore be to create new game designs that 
start from the existence of the AI. 

Taking a positive stance on the topic we would argue that any existing gap be- 
tween academic and industrial NPC AI nowadays can be viewed as a healthy in- 
dication of a parallel progress with a certain degree of collaboration. As industry 
and academia do not necessarily attempt to solve the same problems with the same 
approaches it may be that NPC AI Solutions emerging from industry can inspire 
new approaches in academia and vice versa. In summary, the NPC AI gap is clearly 
smaller in tasks that both academia and industry care about. Certain aspects of NPC 
AI, however, are far from being solved in an ideal fashion and others—such as mod- 
eling emotion in role playing games—are stili at the beginning stages of investiga- 
tion. So while we can praise the NPC AI of The Elder Scrolls V: Skyrim (Bethesda 
Softworks, 2011) we cannot be as positive about the companion AI of that game. 
We can view the very existence of such limitations as an opportunity that can bring 
industry and academia even closer to work on further improving existing NPC AI 
in games. 

A different take on this discussion—which is supported by some game develop¬ 
ers and game AI academics—is that NPC AI is almost solved for most production 
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tasks; some take it one step further and claim that game AI research and develop- 
ment should focus solely on non-traditional uses of AI 04771167 lll . The level of AI 
sophistication in recent games such as Left 4 Dead (Valve, 2008) and The Elder 
Scrolls V: Skyrim (Bethesda Softworks, 2011) suppoits this argument and suggests 
that advances in NPC AI have reached satisfactory levels for many NPC controi 
challenges faced during game production. Due to the rise of robust and effective 
industrial game AI Solutions, the convergence to satisfying NPC performances, the 
support of the multidisciplinary nature of game AI and a more pragmatic and holis- 
tic view of the game AI problem, recent years have seen a shift of academic and 
industrial interests with respect to game AI. It seems that we have long reached an 
era where the primary focus of the application of AI in the domain of games is not 
on agents and NPC behaviors. The focus has, instead, started to shift towards inter- 
weaving game design and game technology by viewing the role of AI holistically 
and integrating aspects of procedural content generation and player modeling within 
the very notion of game AI 07641 . 

The view we take in this book is that AI can help us to make better games but 
that this does not necessarily happen through better, more human-like or believable 
NPCs 07641 . Notable examples of non-NPC AI in games include No Man’s Sky 
(Helio Games, 2016) and its procedural generation of a quintillion different planets 
and Nevermind (Flying Mollusk, 2016) with its affective-based game adaptation 
via a multitude of physiological sensors. But there might be other AI roles with 
game design and game development that are stili to be found by AI. Beyond playing 
games, modeling players or generating content, AI might be able to play the role of 
a design assistant, a data analyst, a playtester, a game critic, or even a game director. 
Finally, AI could potentially play and design games as well as model their players 
in a general fashion. The final chapter (Chapterj^ of this book is dedicated to these 
frontier research areas for game AI. 


1.3 Why Games for Artificial Intelligence 

There are a number of reasons why games offer the ideal domain for the study of 
artificial intelligence. In this section, we list the most important of them. 


1.3.1 Games Are Hard and Interesting Problems 

Games are engaging due to the effort and skills required from people to complete 
them or, in the case of puzzles, solve them. It is that complexity and interestingness 
of games as a problem that makes them desirable for AI. Games are hard because 
their finite state spaces, such as the possible strategies for an agent, are often vast. 
Their complexity as a domain rises as their vast search spaces often feature small 
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feasible spaces (solution spaces). Further, it is often the case that the goodness of 
any game state is hard (or even impossible) to assess properly. 

From a computational complexity perspective, many games are NP-hard (NP 
refers to “nondeterministic polynomial time”), meaning that the worst-case com¬ 
plexity of “solving” them is very high. In other words, in the general case an algo¬ 
ri thm for solving a particular game could run for a very long time. Depending on a 
game’s properties complexity can vary substantially. Nevertheless, the list of games 
that are NP-hard is rather long and includes games such as the two-player incom¬ 
plete information Mastermind game 1733116601 . the Lemmings (Psygnosis, 1991) 
arcade game II334I and the Minesweeper game by Microsoft 0311 . It should be 
noted that this computational complexity characterization has littie to do with how 
hard the games are to play for humans, and does not necessarily say much about 
how well heuristic AI methods can play them. However, it is ciear that at least in 
theory, and for arbitrary-size instances, many games are very hard. 

The investigations of AI capacity in playing games that are hard and complex has 
been benchmarked via a number of milestone games. As mentioned earlier, Chess 
and (to a lesser degree) Checkers have traditionally been seen as the “drosophila for 
AI research” even from the early days of AI. After the success of Deep Blue and 
Chinook in these two games we gradually invented and cited other more complex 
games as AI “drosophilae”, or universal benchmarks. Lemmings has been charac- 
terized as such; according to McCarthy II446II it “connects logical formalizations 
with information that is incompletely formalizable in practice”. In practice, games 
for which better APIs have been developed—such as Super Mario Bros (Nintendo, 
1985^and StarCraft (Blizzard Entertainment, 1998)—have become more popular 
benchmarks. 

The game of computer Go has also been another core and traditional game AI 
benchmark with decades of active research. As a measure of problem complexity a 
typical game in Go has about 10^^*^ States. The hrst AI feature extraction investiga¬ 
tions in Go seem to date back to the 1970s 17981 . The game received a lot of research 
attention during several world computer Go championships up untii the recent suc¬ 
cess of AlphaGo II629I . AIphaGo managed to beat two of the best Go professional 
human players using a combination of deep learning and Monte Cario tree search. 
In March 2016, AlphaGo won against Lee Sedol and in May 2017 it won all three 
games against the world’s number 1 ranked player, Ke Jie. 

The StarCraft (Blizzard Entertainment, 1998) real-time strategy game can be 
characterized as perhaps the single hardest game for computers to play well. At the 
time of writing this book the best StarCraft bots only reach the level of amateur 
playersj^The complexity of the game derives mainly from the multi-objective task 
of controlling multiple and dissimilar units in a game environment of partial infor¬ 
mation. While it is not trivial to approximate the state space of StarCraft, according 
to a recent study II729I . a typical game has at least 10*'®^^ possible States. In com- 
parison, the number of protons in the observable universe is only about lO^*' Gm. 

^ Note that the original game title contains a dot, i.e., Super Mario Bros.', for practical reasons, 
however, we will omit the dot when referring to the game in the remainder of this book. 

^ http://www.cs.mun. ca/'dchurchill/starcraftaicomp/ 
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The number of StarCraffs possible States sounds huge but, interestingly enough, 
its search space can be of manageable size if represented by bytes. On that basis, 
we require about 700 bytes of information to represent the StarCraft search space 
whereas the number of protons in the known universe is equivalent to the number of 
configurations of about 34 bytes. 

One could of course design games that are harder on purpose, but there is no 
guarantee anyone would want to play those games. When doing AI research, work- 
ing on games that people care about means you are working on relevant problems. 
This is because games are designed to challenge the human brain and successful 
games are typically good at this. StarCraft (Blizzard Entertainment, 1998)—and 
its successor StarCraft II (Blizzard Entertainment, 2010)—are played by millions 
of people all over the world, with a very active competition scene of elite profes- 
sional players and even dedicated TV channels such as OGbj^]— a South Korean 
cable television channel—or twitch channel^that specialize in broadcasting video 
game-related content and e-sports events. 

Many would claim that StarCraft (Blizzard Entertainment, 1998) is the next ma¬ 
jor target for AI research on playing to win. In academia, there is already a rich 
body of Work on algorithms for playing (parts of) StarCraft 050411569115051 (1241 . 
or generating maps for it 07120 . Beyond academia, industrial AI leaders Google 
DeepMind and Eacebook seem to be in agreement and on a similar scientific mis- 
sion. DeepMind recently announced that StarCraft II will be one of their major new 
testbeds, after their success at training deep networks to play Atari games in the 
Arcade Learning Environmenj^ (ALE) ll40l framework. At the time of writing this 
book, DeepMind in collaboration with Blizzard Entertainment opened up StarCraft 
II to AI researchers for testing their algorithms {^Eacebook AI Research has led the 
development of TorchCraft IMl— a bridge between the deep learning Torch library 
and StarCraft —and recently published their first paper on using machine learning 
to learn to play StarCraft 07290 , showing that they take this challenge seriously. 
Another industrial game AI research lab collaborating with academia on solving 
StarCraft is hosted in Alibaba 05231 . Given the game’s complexity, it is unlikely we 
will conquer all of it soon 02341 but it is a game through which we expect to see AI 
advancements in the years to come. 


1.3.2 Rich Human-Computer Interaction 

Computer games are dynamic media by definition and, arguably, offer one of the 
richest forms of human-computer interaction (HCI); at least at the time of writing 
this book. The richness of interaction is defined in terms of the available options 


^ http://ch.interest.me/ongamenet/ 

® For instance, https://www.twitch.tv/starcraft 
® http://www.arcadeleamingenvironment.org/ 

*** Follow developments at: https://deepmind.com/blog/ 
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a player has at any given moment and the ways (modalities) a player can interact 
with the medium. The available options for the player are linked to the game action 
space and the complexity associated with it in games such as StarCraft II (Blizzard 
Entertainment, 2010). Further, the modalities one may use to interact with games 
nowadays extend beyond the traditional keyboard, mouse and tablet-like haptics, to 
game controllers, physiology such as heart rate variability, body movement such as 
body stance and gestures, text, and speech. As a resuit, many games would easily 
top the list of Information bits exchanged between them and their users per second 
compared to any other HCI medium; however, such comparative studies are not 
currently available to further support our claims. 

Clearly, as we will see later in this book, games offer one of the best and most 
meaningful domains for the realization of the affective loop, which defines a frame- 
work that is able to successfully elicit, detect and respond to the cognitive, behav- 
ioral and emotive patterns of its user 06701 . The potential that games have to influ- 
ence players is mainly due to their ability to place the player in a continuous mode of 
interaction with the game which elicits complex cognitive, affective and behavioral 
responses to the player. This continuous interaction mode is enriched by fast-paced 
and multimodal forms of user interactivity that are often possible in games. As every 
game features a player—or a number of players—the interaction between the player 
and the game is of key importance for AI research as it gives algorithms access to 
rich player experience stimuli and player emotional manifestations. Such complex 
manifestations, however, cannot trivially be captured by Standard methods in ma- 
chine leaming and data Science. Undoubtedly, the study of game-player interaction 
via artificial intelligence not only advances our knowledge about human behavior 
and emotion but also contributes to the design of better human-computer interac¬ 
tion. As a resuit, it further pushes the boundaries of AI methods in order to address 
the challenges of game-based interaction. 


1.3.3 Games Are Popular 

While video games, back in the 1980s, were introduced as a niche activity for those 
having access to a video arcade or to consoles such as Atari 2600, they gradually 
turned into a multi-billion industry generating, in the 2010s, a global market revenue 
higher than any other form of Creative industry, including film and music. At the 
time of writing, games generate a worldwide total of almost $100 billion in revenue 
which is expected to rise to approximately $120 billion by 2019p^ 

But why did games became so popular? Beyond the obvious argument of games 
being able to enhance a user’s intrinsic motivation and engagement by offering in¬ 
teractivity capacities with a Virtual environment, it was the technological advance- 
ments over the last 40 years that drastically changed the demographics of players 
EU- Back in the early 1980s games used to be played solely in arcade entertain- 

** See, for instance, the global games market report by Newzoo: 
https://newzoo.com/solutions/revenues-projections/global-games-market-report/ 
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ment machines; nowadays, however, they can be played using a multitude of devices 
including a PC (e.g., multi-player Online or casual games), a mobile phone, a tablet, 
a handheld device, a Virtual reality device, or a console (and obviously stili an ar- 
cade entertainment machine!). Beyond the technological advancements that fostered 
accessibility and democratized gameplay, it is also the culture that follows a new 
medium and develops it into a new form of art and expression. Not only the inde- 
pendent scene of game design and developmenj^has contributed to this culture, but 
also the outreach achieved from the multitude of purposes and objectives of games 
beyond mere entertainment: games for art, games as art, games for a change, phys- 
ical interactive games, games for education, games for training and health, games 
for scientific discovery, and games for culture and museums. In brief, not only are 
games everywhere and present in our daily lives but they also shape our social and 
cultural values at large—as for instance evidenced by the recent massive success of 
Pokemon Go (Niantic, 2016). As a byproduct of their popularity games offer easy 
access to people with world-class performance in the domain. Experts (i.e., profes- 
sional players) for many board and digital games that are world-ranked according to 
their gameplay performance have participated regularly in competitions against AI 
algorithms; examples include Garry Kasparov (Chess), and Lee Sedol and Ke Jie 
(Go). 

As games become more popular, grow in quantity, and become more complex, 
new AI Solutions are constantly required to meet the new technological challenges. 
This is where AI meets a domain with a strong industriai backing and a desire to 
suppoit sophisticated technology for bettering player experience. Fuithermore, very 
few domains for AI offer the privilege of daily accessibility to new content and data 
from their popular use. But let us look at these two aspects in more detail below. 


1.3.3.1 Popular Means More Content 

The more people play (more) games, the more content is required for games. Con¬ 
tent takes effort to create but, over the years, mechanisms have been developed that 
allow both machines and players to design and create various forms of content in 
games. Games have gradually developed to be content-intensive Software applica- 
tions that demand content that is both of direct use in the game and of sufficient 
novelty. The overwhelming demand for new and novel gaming experiences from a 
massive community of users constantly pushes the boundaries of human and com- 
putational creativity to new grounds; and naturally AI at large. 

Content in games, beyond any other form of multimedia or Software application, 
not only covers ali possible forms of digital content such as audio, video, image, 
and text but it also comes in massive numbers of different resolutions and represen- 
tations. Any algorithm that attempts to retrieve and process the variety and amount 
of content within or across games is directly faced with the challenges of interoper- 
ability and content convergence as well as scalability caused by such big data sets. 
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Compare this with a typical robot simulator, where ali the environments would 
have to be painstakingly hand-crafted or adapted from data collected from the real 
World. When using games as testbeds, there is no such content shortage. 


1.3.3.2 Popular Means More Data 

Massive content creation (either by games or players) is one major effect of games’ 
popularity; the other is massive data generation of game playthroughs and player 
behavior. Since the late 2000s, game companies have had access to accurate game 
telemetry Services that allow them to track and monitor player purchases, churn and 
re-engagement, or the progress of play for debugging either the game or the play¬ 
ers’ experience. The algorithmic challenges met here follow the general challenges 
of big data and big data mining research 11445 1 . which include data filtering during 
data acquisition, metadata generation, information extraction from erroneous and 
missing data, automatic data analysis across dissimilar datasets, appropriate declar¬ 
ative query and mining interfaces, scalable mining algorithms, and data visualiza- 
tion EU- Luckily enough some of these datasets are nowadays openly available 
for game analytics and game data mining research. Indicatively, in March 2017, the 
OpenDota projecj^ —a community-maintained open source Dota 2 data platform— 
released a sanitized archive of over a billion matches (!) of Dota 2 (Valve Corpora¬ 
tion, 2013) that were played between March 2011 and March 2016p] 


1.3.4 There Are Challenges for AU AI Areas 

Unlike some more narrow benchmarks, games challenge all core areas of AI. This 
can be seen by taking a number of broadly accepted areas of AI and discussing the 
challenges available for those areas in games. Signal processing, for starters, meets 
great challenges in games. Data from players, for instance, not only come in dif¬ 
ferent resolutions—in-game events vs. head pose vs. the player’s physiology—they 
also originale from multiple modalities of fast-paced interaction in an environment 
that elicits complex cognitive and affective patterns to the player. Multi-modal in¬ 
teraction and multi-modal fusion are non-trivial problems when building embodied 
conversational agents and Virtual characters. Further, the complexity of the signal 
Processing task in games is augmented due to the spatio-temporal nature of the sig- 
nals which is caused by the rich and fast-paced interaction with the game. 

As discussed in the introductory section of this chapter Checkers, Chess, Jeop- 
ardy!, Go and arcade games mark a historical trace of core major milestones for ma- 
chine learning (Go and arcade games), tree search (Checkers and Chess), knowl- 
edge representation and reasoning (Jeopardy!) and natural language processing 


https://www.opendota.com/ 

https://blog.opendota.eom/2017/03/24/datadump2/ 
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(Jeopardy!, Kinect games) and, in tum, they resulted in major breakthroughs for AI. 
This historical association between AI accomplishments and games already pro¬ 
vides ciear evidence that all the above areas have traditionally been challenged by 
games. While the full potentiai of machine leaming remains to be discovered in 
games such as StarCraft II (Blizzard Entertainment, 2010), natural language Pro¬ 
cessing (NLP) has been challenged deeply through games involving narrative and 
natural language input. NLP is challenged even further in game environments that 
wish to realize forms of interactive storytelling 0148L 

Finally when it comes to planning and navigation, games have traditionally 
offered environments of high and increasing complexity for algorithms to be tested 
in. While games such as StarCraft clearly define major milestones for planning 
algorithms, navigation and pathfinding have reached a certain degree of maturity 
through simulated and roborealistic game environments featuring multiple entities 
(agents). An additional benefit of games as a domain for behavioral planning is that 
they offer a realistic yet a far more convenient and cheaper testbed compared to 
robotics. Beyond the extensive testing and advancement of variants of A* through 
games, popular and highly effective tree search variants such as the Monte Carlo 
tree search llTTl algorithm have been invented in response to problems posed by 
game-playing. 


1.3.5 Games Best Realize Long-Term Goals of AI 

One of the long-standing questions that AI is faced with is what is the ultimate 
long-term goal for AII While numerous debates and books have been dedicated to 
this topic, the collaborative effort of Wikipedia authors addressing this questioij^ 
reveals the areas of social intelligence and affective interaction, (computational) 
creativity, and general intelligence as the most critical long-term goals of AI. Our 
reference has been critiqued for systemic hias (in any controversial question such as 
the one above); we argue, however, that any reference on this topic would be sub¬ 
jective anyhow. Without aiming to be exclusive or biased, we believe that the three 
aforementioned areas collectively contribute to better AI systems and we discuss 
why games best realize these three goals below. These three long-term goals define 
frontier research areas for game AI and are further elaborated on in the last chapter 
of this book. 


1.3.5.1 Social and Emotional Intelligence 

Affective computing 15301 is the multidisciplinary area of study across computer 
Science, cognitive Science and psychology that investigates the design and develop- 
ment of intelligent Software that is able to elicit, detect, model, and express emotion 
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and social intelligence. The ultimate aim of affective computing is the realization of 
the so-called affective loop II670I which, as we covered earlier, defines a system that 
is able to successfully elicit, detect and respond to the emotions of its user. Natu- 
rally, both emotive and social aspects of intelligence are relevant for a system that 
realizes the affective loop. 

Games can offer a highly meaningful realization of the affective loop and af¬ 
fective interaction II781I . Games are by definition both entertaining (whether used 
for pure satisfaction, training or education) and Interactive activities that are played 
within fantasy worlds. Thus, any limitations of affective interaction—such as the 
difficulty to justify affective-based game decisions or alterations of content to the 
user—are absorbed naturally. For example, an erroneous affective response of a 
game character can stili be justified if the design of the character and the game con- 
text do not break the suspension ofdisbelief of the player—during which the player 
ignores the medium and the interaction for the sake of her enjoyment. Further, games 
are designed to offer affective experiences which are influenced by player feed- 
back and players are willing to go through, e.g., frustrating, anxious, and fearful 
episodes of play for experiencing involvement. To that end, a user under gaming 
conditions—more than any other form of human-computer interaction—is gener- 
ally open to affective-based alterations of the interaction and influences of his/her 
emotional state. 


1.3.5.2 Computational Creativity 

Computational creativity studies the potential of Software to autonomously generate 
outcomes that can be considered Creative or algorithmic processes that are deemed 
to be Creative ll54l 17541 . Computer games can be viewed as the killer application 
domain for computational creativity M381L It is not only their unique features that 
we covered in earlier sections of this chapter—i.e., being highly Interactive, dynamic 
and content-intensive Software applications. Most importantly it is their multifaceted 
nature. In particular, it is the fusion of the numerous and highly diverse Creative 
domains—visual art, sound design, graphic design, interaction design, narrative, 
Virtual cinematography, aesthetics and environment beautification—within a single 
Software application that makes games the ideal arena for the study of computational 
creativity. It is also important to note that each art form (or facet) met in games elicits 
different experiences to its users; their fusion in the final Software targeting a rather 
large and diverse audience is an additional challenge for computational creativity. 

As a resuit, the study of computational creativity within and/or computer games 
II381I advances in both the field of AI and the domain of games. Games can, first, be 
improved as products via computational creations (for) and/or, second, be used as 
the ultimate canvas for the study of computational creativity as a process (within). 
Computer games not only challenge computational creativity but they also provide a 
Creative sandbox for advancing the field. Finally, games can offer an opportunity for 
computational creativity methods to be extensively assessed via a huge population 
of users of commercial-standard products of high impact and financial value. 
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1.3.5.3 General Intelligence 

AI has investigated the general intelligence capacity of machines within the domain 
of games more than any other domain thanks to the ideal properties of games for 
that purpose: controlled yet interesting and computationally hard problems 15981 . 
In particular, the capacity of AI to play unseen games well—i.e., general game 
playing—has seen a number of advancements in recent years. Starting with the 
general game playing competition 0223 L focusing on board games and similar dis¬ 
crete perfect information games, we now also have the Arcade Leaming Environ- 
ment ||40l and the General Video Game AI Competition 05281 . which offer radically 
different takes on arcade video games. Advancements vary from the efforts to cre¬ 
ate game description languages suitable for describing games used for general game 
playing 05331 12231 1400116911 1354115961 1181114290 to the establishment of a set of 
general video game AI benchmarks 0223115281 l40l to the recent success of deep Q- 
learning in playing arcade games with human-level performance just by processing 
the screen’s pixels 04641 . 

While general game playing is studied extensively and constitutes one of the key 
areas of game AI 07851 . we argue that the focus of generality solely with regard to 
the performance of game-playing agents is very narrow with respect to the spectrum 
of roles for general intelligence in games. The types of general intelligence required 
within game development include game and level design as well as player behavior 
and experience modeling. Such skills touch upon a diverse set of cognitive and 
affective processes which have until now been ignored by general AI in games. 
For general game AI to be truly general and advance AI algorithmically, it needs 
to go beyond game playing while retaining the focus on addressing more than a 
single game or player 07181 . We further argue that the challenge of bringing together 
different types of skillsets and forms of intelligence within autonomous designers 
of games cannot only advance our knowledge about human intelligence but also 
advance the capacity of general artificial intelligence. 


1.4 Why Artificial Intelligence for Games 

The various uses of AI in games are beneficial for the design of better games for a 
number of reasons. In this section we focus on the benefits obtained by allowing AI 
to play a game, to generate content and to analyze player experience and behavior. 


1.4.1 AI Plays and Improves Your Game 

AI can improve games in several ways by merely playing them. The game industry 
usually receives praise for the AI of their games—in particular, the non-player or 
opponent AI—when the AI of the game adds to the commercial value of the game. 
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it contributes to better game reviews, and it enhances the experience of the player. 
Whether the underlying AI is based on a simple behavior tree, a utility-based AI 
or alternatively on a sophisticated machine learned reactive controller is of limited 
relevance as long as it serves the aforementioned purposes. An unconventional and 
effective solution to an NPC task can often be a critical factor that shapes manage- 
ment, marketing and monetization strategies during and after production. 

As we will see in Chapterj^ AI plays games with two core objectives in mind: 
play well and/or play believably (or human-like, or interestingly). Further AI can 
control either the player character or the non-player character of the game. AI that 
plays well as a player character focuses on optimizing the performance of play— 
performance is measured as the degree to which a player meets the objectives of 
the game solely. Such AI can be of tremendous importance for automatic game 
testing and for the evaluation of the game design as a whole. AI that plays well 
as a non-player character, instead, can empower dynamic difficulty adjustment and 
automatic game balancing mechanisms that will in turn personalize and enhance the 
experience for the player (as in 1165III among many). If the focus of AI is shifted on 
controlling player characters that play believably or human-like (as in ||9^I7 1911264 II 
among many) then AI can serve as means for player experience debugging or as 
demonstration of realistic play for design purposes. Finally, a game that features a 
rich interaction with NPCs can only benefit from AI that Controls NPCs which are 
expressive and depict human-like and believable behaviors (as in II5631 16831 1762II 
among many). 


1.4.2 More Content, Better Content 

There are several reasons for game designers and developers to be interested in 
AI and, in particular, in content generation as covered in detail in Chapter]^ The 
first and most historical reason is memory consumption. Content can typically 
be compressed by keeping it “unexpanded” until needed. A good example is the 
classic space trading and adventure game Elite (Acornsoft, 1984), which managed 
to keep hundreds of star systems in a few tens of kilobytes of memory available 
on the hardware. Further, content generation might foster or further inspire human 
creativity and allow the emergence of completely new types of games, game genres 
or entirely new spaces of exploration and artistic expression 03811 . Moreover, if 
new content can be generated with sufficient variety, quality and quantity, then it 
may become possible to create truly endless games with ultimate replay value. 
Finally, when content generation is associated with aspects of play we can expect 
personalized and adaptive play to emerge via the modification of content. 

Unlike other areas of game AI—such as general game playing which might be 
considered more of an academic pursuit—content generation is a commercial ne- 
cessity BsTl . Prior to the academic interest in content generation—which is rather 
recent 07041l720ll616l — content generation systems had a long history of supporting 
commercial Standard games for creating engaging yet unpredictable game experi- 
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ences but, most impoitantly, lessening the burden of manual game content creation. 
Naturally, games that feature sophisticated content generation systems can garner 
praise for their technologies—as in Diablo III (Blizzard, 2012)—or even build 
an entire marketing campaign on content generation—like No Man’s Sky (Helio 
Games, 2016). 


1.4.3 Player Experience and Behavioral Data Analytics 

The use of AI for the understanding of player experience can drive and enhance 
the design process of games. Game designers usually explore and test a palette of 
mechanics and game dynamics that yield experience patterns they desire to put the 
player through. Player States such as engagement, fear and stress, frustration, an- 
ticipation, and challenge define critical aspects of the design of player experience, 
which is dependent on the genre, the narrative and the objectives of the game. As a 
resuit, the holy grail of game design—that is player experience—can be improved 
and tailored to each player but also augmented via richer experience-based interac- 
tion. Further, as a direct consequence of better and faster design, the whole game 
development process is boosted and improved. 

Beyond the experience of the player, data derived from games, their use and their 
players provide a new and complementary way of designing games, of making man- 
agerial and marketing decisions about games, of affecting the game production, and 
of offering a better customer Service 01781 . Any Al-informed decisions about the 
future of a game’s design or development are based on evidence rather than intu- 
ition, which showcases the potential of AI—via game analytics and game data min- 
ing—for better design, development and quality assurance procedures. In summary, 
as we will see in the remaining chapters of this book, Al-enabled and data-driven 
game design can directly contribute to better games. 


1.5 Structure of This Book 

We structured this book into three main parts. In the first part (Chapter|^ we outline 
the core game AI methods that are important for the study of AI in and for games. 
In the second part of the book we ask the question: how can AI be used in games! 
Answers to this question define the main game AI areas identified and covered as 
corresponding chapters: 

• AI can play games (Chapterj^. 

• AI can generate content (Chapter|^. 

• AI can model players (Chapter|^. 

In the final part of the book, we attempt a synthesis of the game AI areas that 
make up this field and discuss the research trends in what we envisage as the game 
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Fig. 1.2 Illustrating the associations among the chapters of this book. Light gray, blue and red 
areas represent chapters of Part I, Part II, and Part III of the book, respectively. 


AI panorama (Chapter]^. Building on this synthesis, we conclude the book with a 
chapter dedicated to the research areas we view as largely unexplored and important 
for the study of game AI, namely frontier game AI research areas (Chapter]^. An 
illustration of how the different chapters fit together in this book is presented in 
Fig. 1.2 The readers of this book may wish to skip parts that are not of interest or 
not appropriate given their background. For instance, readers with a background in 
artificial intelligence may wish to skip the first part, while readers who wish to get a 
rapid overview of the game AI field or a glimpse of frontier research trends in game 
AI can solely focus on the final part of the book. 


1.5.1 What We (Don’t) Cover in This Book 

The list of core uses of AI in games identified in this book should not be regarded 
as complete and inclusive of ali potential areas of game AI research. It could also be 
argued that the list of areas we cover is arbitrary. However, this could likely be said 
of any research field in any discipline. (In Software engineering, Software design 
overlaps with Software requirements analysis, and in cognitive psychology, memory 
research overlaps with attention research.) While it might be possible to perform an 
analysis of this research field so that the individual areas have minimal or no overlap, 
this would likely lead to a list of artificial areas that do not correspond to the areas 
game AI students, researchers and practitioners perceive themselves to be working 
in. It could also be argued that we are omitting certain areas. For example, we only 
briefly discuss the topic of pathfinding in games, whereas some other authors see 
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this as a core concern of game AI. In our view, pathfinding is a relatively isolated 
area with restxicted interaction with the other uses of AI in games. Further, pathfind¬ 
ing has been covered substantially in other game AI textbooks PI091146II |62l al- 
ready. Another example is the area of computational narrative, which is viewed 
as a domain for content generation and covered oniy relatively briefly in Chapter]^ 
It would certainly be possible to write a whole textbook about computational nar¬ 
rative, but that would be someone else’s job. Beyond particular application areas of 
game AI, in Chapterj^we cover a number of popular methods used in the field. The 
list of methods is not inclusive of ali AI areas that can find application in games; it 
is a list, however, we consider sufficient for covering the theoretical foundations of 
a graduate game AI course. In that regard, we only partially cover planning meth¬ 
ods and probabilistic methods such as Bayesian approaches and methods based on 
Markov chains, respectively, in Chapterj^and Chapterj^ 

Another important note is the relationship of this book with the general areas of 
game tbeory B21411472115131 and other related research AI areas such as multia- 
gent Systems P6261 . Game theory studies mathematical models of rational decision 
makers within abstract games, for the analysis of economic or social behavior in ad- 
versarial 1147III or cooperative 11081 settings. More specifically, game theory focuses 
on characterizing or predicting the actions of rational or bounded-rational agents, 
and studies the related emerging “game solution concepts”—such as the celebrated 
Nash equilibrium 14781 . While the book does not cover these areas in detail, as they 
are peripheral to the aims of game AI as currently envisioned, we nevertheless be- 
lieve that it would be fruitful to incorporate foundational ideas and concepts from 
game theory and multiagent systems research in the game AI field. Particular game 
AI areas such as game-playing and player (or opponent) modeling 02141 could ben¬ 
efit from theoretical models of game-playing 02141 and empirical implementations 
of agent-based systems. Similarly, we believe that game AI research and practice 
can only help in advancing work on theoretical game theory and multiagent sys¬ 
tems. Of course, interweaving these fields properly with the current stream of game 
AI research is a non-trivial exercise, given the different focus and paths these fields 
have taken. Further, there are limits to the degrees theoretical models can capture 
the complexity of games covered in game AI. However, game theory undeniably 
constitutes a key theoretical pillar for the study of rational decision making, and 
rational decision making is arguably key for winning in games. Some instances of 
economic game theory that found successful applications in game AI include the 
various implementations of theoretical models for playing abstract, card and board 
games—notably a version of Poker was used as a testbed of game theory by von 
Neumann and Morgenstern IMl as far back as 1944 g06l . Some of these im¬ 
plementations are discussed in Chapter Another example that bridged rational 
decision making theory with game AI is the procedural personas approach covered 
in Chapter]^ 

Finally, note that this book is the effort of two authors with a high degree of 
consensus among them and not an edited volume of several authors. As such, it 
contains subjective views on how the game AI field is positioned within the greater 
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AI research scene and how the game AI areas are synthesized. Your mileage may 
vary. 


1.6 Summary 

AI has a long-standing and healthy relationship with games. AI algorithms have 
been advanced or even invented through games. Games, their design and develop- 
ment, in turn, have benefited largely by the numerous roles AI has taken in games. 
This book focuses on the main uses of AI in games, namely, for playing games, for 
generating content, and for modeling players, which are covered extensively in the 
following chapters. Before delving into the detaiis of these AI uses, in the next chap¬ 
ter we outline the core methods and algorithms used in the field of game artificiai 
intelligence. 
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Chapter 2 

AI Methods 


This chapter presents a number of basic AI methods that are commonly used in 
games, and which will be discussed and refetred to in the remainder of this book. 
These are methods that are frequently covered in introductory AI courses—if you 
have taken such a course, it should have exposed you to at least half of the methods 
in this chapter. It should also have prepared you for easily understanding the other 
methods covered in this chapter. 

As noted previously, this book assumes that the reader is already familiar with 
core AI methods at the level of an introductory university course in AI. Therefore, 
we recommend you to make sure that you are at least cursorily familiar with the 
methods presented in this chapter before proceeding to read the rest of the book. The 
algorithm descriptions in this chapter are high-level descriptions meant to refresh 
your memory if you have leamed about the particular algorithm at some previous 
point, or to explain the general idea of the algorithm if you have never seen it before. 
Each section comes with pointers to the literature, either research papers or other 
textbooks, where you can hnd more details about each method. 

In this chapter we divide the relevant parts of AI (for the purposes of the book) 
into six categories: ad-hoc authoring, tree search, evolutionary computation, super- 
vised learning, reinforcement leaming and unsupervised learning. In each section 
we discuss some of the main algorithms in general terms, and give suggestions for 
further reading. Throughout the chapter we use the game of Ms Pac-Man (Namco, 
1982) (or Ms Pac-Man for simplicity) as an overarching testbed for all the algo¬ 
rithms we cover. For the sake of consistency, all the methods we cover are employed 
to control Ms Pac-Man’s behavior even though they can hnd a multitude of other 
uses in this game (e.g., generating content or analyzing player behavior). While 
a number of other games could have been used as our testbed in this chapter, we 
picked Ms Pac-Man for its popularity and its game design simplicity as well as for 
its high complexity when it comes to playing the game. It is important to remember 
that Ms Pac-Man is a non-deterministic variant of its ancestor Pac-Man (Namco, 
1980) which implies that the movements of ghosts involve a degree of randomness. 

In Section [2~T] we go through a quick overview of two key overarching compo- 
nents of all methods in this book; representation and utility. Behavior authoring, 
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covered in Section 2.2 refers to methods employing static ad-hoc representations 
without any form of search or learning such as finite state machines, behavior trees 
and utility-based AI. Tree search, covered in Section |2.3| refers to methods that 
search the space of future actions and build trees of possible action sequences, often 
in an adversarial setting; this includes the Minimax algorithm, and Monte Carlo tree 
search. Covered in Section 2.4 evolutionary computation refers to population- 
based global stochastic optimization algorithms such as genetic algorithms, or evo- 
lution strategies. Supervised learning (see Section 2.51 refers to learning a model 
that maps instances of datasets to target values such as classes; target values are 
necessary for supervised learning. Common algorithms used here are backpropaga- 
tion (artificial neural networks), suppoit vector machines, and decision tree learning. 
Reinforcement learning is covered in Section [2^ and refers to methods that solve 
reinforcement learning problems, where a sequence of actions is associated with 
positive or negative rewards, but not with a “target value” (the correct action). The 
paradigmatic algorithm here is temporal difference (TD) learning and its popular in- 
stantiation Q-learning. Section 5.6.3 outlines unsupervised learning which refers 
to algorithms that find patterns (e.g., clusters) in datasets that do not have target 
values. This includes clustering methods such as k-means, hierarchical clustering 
and self-organizing maps as well as frequent pattern mining methods such as Apri- 
ori and generalized sequential patterns. The chapter concludes with a number of 
notable algorithms that combine elements of the algorithms above to yield hybrid 
methods. In particular we cover neuroevolution and TD learning with ANN function 
approximation as the most popular hybrid algorithms used in the field of game AI. 


2.1 General Notes 

Before detailing each of the algorithm types we outiine two overarching elements 
that bind together all the AI methods covered in this book. The former is the algo¬ 
rithm’s representation; the second is its utility. On the one hand, any AI algorithm 
somehow Stores and maintains knowledge obtained about a particular task at hand. 
On the other hand, most AI algorithms seek to find better representations of knowl¬ 
edge. This seeking process is driven by a utility function of some form. We should 
note that the utility is of no use solely in methods that employ static knowledge 
representations such as finite state machines or behavior trees. 


2.1.1 Representation 

Appropriately representing knowledge is a key challenge for artificial intelligence 
at large and it is motivated by the capacity of the human brain to store and retrieve 
obtained knowledge about the world. The key questions that drive the design of 
representations for AI are as follows. How do people represent knowledge and how 
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can AI potentially mimic that capacity? What is the nature of knowledge? How 
generic can a representation scheme be? General answers to the above questions, 
however, are far from trivial at this point. 

As a response to the open general questions regarding knowledge and its repre¬ 
sentation, AI has identified numerous and very specific ways to store and retrieve 
Information which is authored, obtained, or learned. The representation of knowl¬ 
edge about a task or a problem can be viewed as the computational mapping of the 
task under investigation. On that basis, the representation needs to store knowledge 
about the task in a format that a machine is able to process, such as a data structure. 

To enable any form of artificial intelligence knowledge needs to be represented 
computationally and the ways this can happen are many. Representation types in¬ 
clude grammars such as grammatical evolution, graphs such as finite state ma- 
chines or probabilistic models, trees such as decision trees, behavior trees and ge- 
netic programming, connectionism such as artificial neural networks, genetic such 
as genetic algorithms and evolutionary strategies and fabular such as temporal dif- 
ference learning and Q-learning. As we will see in the remainder of this book, all 
above representation types find dissimilar uses in games and can be associated with 
various game AI tasks. 

One thing is certain for any AI algorithm that is tried on a particular task: the 
chosen representation has a major impact on the performance of the algorithm. Un- 
fortunately, the type of representation to be chosen for a task follows the no free 
lunch theorem ESI, suggesting that there is no single representation type which 
is ideal for the task at hand. As a general set of guidelines, however, the repre¬ 
sentation chosen should be as simple as possible. Simplicity usually comes as a 
delicate balance between computational effort and algorithm performance as either 
being over-detailed or over-simplistic will affect the performance of the algorithm. 
Furthermore, the representation chosen should be as small as possible given the 
complexity of the task at hand. Neither simplicity nor size are trivial decisions to 
make with respect to the representation. Good representations come with sufficient 
practical wisdom and empirical knowledge about the complexity and the qualitative 
features of the problem the AI is trying to solve. 


2.1.2 Utility 

Utility in game theory (and economics at large) is a measure of rational choice 
when playing a game. In general, it can be viewed as a function that is able to assist 
a search algorithm to decide which path to take. For that purpose, the utility function 
samples aspects of the search space and gathers information about the “goodness” 
of areas in the space. In a sense, a utility function is an approximation of the so- 
lution we try to find. In other words, it is a measure of goodness of the existing 
representation we search through. 

Similar concepts to the utility include the heuristic used by computer Science 
and AI as an approximate way to solve a problem faster when exact methods are too 
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slow to afford, in particular associated with the tree search paradigm. The concept 
of fitness is used similarly as a utility function that measures the degree to which a 
solution is good, primarily, in the area of evolutionary computation. In mathematical 
optimization, the objective, loss, cost, or error function is the utility function to be 
minimized (or maximized if that is the objective). In particular, in supervised learn- 
ing the error function represents how well an approach maps training examples to 
target (desired) outputs. In the area of reinforcement learning and Markov decision 
processes instead, the utility is named reward, which is a function an agent attempts 
to maximize by learning to take the right action in a particular state. Finally, in the 
area of unsupervised learning utility is often provided internally and within the 
representation via e.g., competitive learning or self-organization. 

Similarly to selecting an appropriate representation, the selection of a utility 
function follows the no free lunch theorem. A utility is generally difficult to de- 
sign and sometimes the design task is basically impossible. The simplicity of its de- 
sign pays off, but the completeness as well. The quality of a utility function largely 
depends on thorough empirical research and practical experience, which is gained 
within the domain under investigation. 


2.1.3 Learning = Maximize Utility (Representation) 

The utility function is the drive for search and essential for learning. On that basis, 
the utility function is the training signal of any machine learning algorithm as it 
offers a measure of goodness of the representation we have. Thereby it implicitly 
provides indications on what to do to further increase the current goodness of the 
presentation. Systems that do not require learning (such as AI methods that are based 
on ad-hoc designed representations; or expert-knowledge systems) do not require a 
utility. In supervised learning the utility is sampled from data—i.e., good input- 
output patterns. In reinforcement learning and evolutionary computation, instead, 
the training signal is provided by the environment—i.e., rewards for doing some- 
thing well and punishments for doing something wrong. Finally, in unsupervised 
learning the training signal derives from the internal structure of the representation. 


2.2 Ad-Hoc Behavior Authoring 

In this section we discuss the first, and arguably the most popular, class of AI 
methods for game development. Finite state machines, behavior trees and utility- 
based AI are ad-hoc behavior authoring methods that have traditionally dominated 
the control of non-player characters in games. Their dominance is evident by the fact 
that the term game AI in the game development scene is stili nowadays synonymous 
with the use of these methods. 
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2.2.1 Finite State Machines 

A Finite State Machine (FSM) 02301 — and FSM variants such as hierarchical 
FSMs—is the game AI method that dominated the control and decision making 
processes of non-player characters in games up until the mid-2000s. 

FSMs belong to the expert-knowledge systems area and are represented as 
graphs. An FSM graph is an abstract representation of an interconnected set of ob- 
jects, symbols, events, actions or properties of the phenomenon that needs to be ad- 
hoc designed (represented). In particular, the graph contains nodes (states) which 
embed some mathematical abstraction and edges (transitions) which represent a 
conditional relationship between the nodes. The FSM can oniy be in one state at 
a time; the current state can change to another if the condition in the corresponding 
transition is fulfilled. In a nutshell, an FSM is defined by three main components: 

• A number of States which store information about a task—e.g., you are currently 
on the explore state. 

• A number of transitions between States which indicate a state change and are 
described by a condition that needs to be fulfilled—e.g., if you hear a fire shot, 
move to the alerted state. 

• A set of actions that need to be followed within each state—e.g., while in the 
explore state move randomly and seek opponents. 

FSMs are incredibly simple to design, implement, visualize, and debug. Further 
they have proven they work well with games over the years of their co-existence. 
However, they can be extremely complex to design on a large scale and are, thereby, 
computationally limited to certain tasks within game AI. An additional critical lim- 
itation of FSMs (and all ad-hoc authoring methods) is that they are not flexible and 
dynamic (unless purposely designed). After their design is completed, tested and 
debugged there is limited room for adaptivity and evolution. As a resuit, FSMs end 
up depicting very predictable behaviors in games. We can, in part, overcome such a 
drawback by representing transitions as fuzzy rules 15321 or probabilities 01091 . 


2.2.1.1 An FSM for Ms Pac-Man 

In this section we showcase FSMs as employed to control the Ms Pac-Man agent. 
A hypothetical and simplified FSM controller for Ms Pac-Man is illustrated in Fig. 
EU In this example our FSM has three States (seek pellets, chase ghosts and evade 
ghosts) and four transitions (ghosts flashing, no visible ghost, ghost in sight, and 
power pili eaten). While in the seek pellets state, Ms Pac-Man moves randomly 
up until it detects a pellet and then follows a pathfinding algorithm to eat as many 
pellets as possible and as soon as possible. If a power pili is eaten, then Ms Pac- 
Man moves to the chase ghosts state in which it can use any tree-search algorithm 
to chase the blue ghosts. When the ghosts start flashing, Ms Pac-Man moves to the 
evade ghosts state in which it uses tree search to evade ghosts so that none is visible 







34 


Chapter 2. AI Methods 



within a distance; when that happens Ms Pac-Man moves back to the seek pellets 
state. 


2.2.2 Behavior Trees 

A Behavior Tree (BT) III10II1121I11111 is an expert-knowledge system which, simi- 
larly to an FSM, models transitions between a finite set of tasks (or behaviors). The 
strength of BTs compared to FSMs is their modularity; if designed well, they can 
yield complex behaviors composed of simple tasks. The main difference between 
BT and FSMs (or even hierarchical FSMs) is that they are composed of behaviors 
rather than States. As with finite state machines, BTs are easy to design, test and 
debug, which made them dominant in the game development scene after their suc- 
cessful application in games such as Halo 2 (Microsoft Game Studios, 2004) IMl 
and Bioshock (2K Games, 2007). 

BT employs a tree structure with a root node and a number of parent and cor- 
responding child nodes representing behaviors—see Fig. |2.2| for an example. We 
traverse a BT starting from the root. We then activate the execution of parent-child 
pairs as denoted in the tree. A child may retum the following values to the parent 
in predetermined time steps (ticks): run if the behavior is stili active, success if the 
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behavior is completed, failure if the behavior failed. BTs are composed of three 
node types; the sequence, the selector, and the decorator the basic functionality of 
which is described below; 


Sequence (see blue rectangle in Fig. 2.2 1 ; if the child behavior succeeds, the 
sequence continues and eventually the parent node succeeds if all child behaviors 
succeed; otherwise the sequence fails. 

Selector (see red rounded rectangle in Fig. |2.2| l: there are two main types of 
selector nodes: the probability and the priority selectors. When a probability se¬ 
lector is used child behaviors are selected based on parent-child probabilities set 
by the BT designer. On the other hand if priority selectors are used, child behav¬ 
iors are ordered in a list and tried one after the other. Regardless of the selector 
type used, if the child behavior succeeds the selector succeeds. If the child be¬ 
havior fails, the next child in the order is selected (in priority selectors) or the 
selector fails (in probability selectors). 

Decorator (see purple hexagon in Fig. |2.2| l: the decorator node adds complex- 
ity to and enhances the capacity of a single child behavior. Decorator examples 
include the number of times a child behavior runs or the time given to a child 
behavior to complete the task. 


Compared to FSM, BTs are more flexible to design and easier to test; they stili 
however suffer from similar drawbacks. In particular, their dynamicity is rather low 
given that they are static knowledge representations. The probability selector nodes 
may add to their unpredictability and methods to adapt their tree structures have 
already shown some promise 03851 . There is also a certain degree of similarity be- 
tween BTs and ABL (A Behavior Language) 04401 introduced by Mateas and Stern 
for story-based believable characters; their dissimilarities have also been reported 
117491 . Note however that this section barely scratches the surface of what is possi- 
ble with BT design as there are several extensions to their basic structure that help 
BTs improve on their modularity and their capacity to deal with more complex be¬ 
havior designs 0170116271 . 


2.2.2.1 A BT for Ms Pac-Man 

Similarly to the FSM example above we use Ms Pac-Man to demonstrate the use 
of BTs in a popular game. In Fig. |2.3| we illustrate a simple BT for the seek pellets 
behavior of Ms Pac-Man. While in the seek pellets sequence behavior Ms Pac-Man 
will first move (selector), it will then find a pellet and finally it will keep eating 
pellets until a ghost is found in sight (decorator). While in the move behavior— 
which is a priority selector—Ms Pac-Man will prioritize ghost-free corridors over 
corridors with pellets and over corridors without pellets. 
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Attack Enemy 



Fig. 2.2 A behavior tree example. The root of the BT is a sequence behavior (attack enemy) which 
executes the child behaviors spot enemy, select weapon, aim and shoot in sequence from left to 
right. The select weapon behavior is a probability selector giving higher probability—denoted by 
the thickness of the parent-child connecting lines—to the mini gun (0.5) compared to the rocket 
launcher (0.3) or the pistol (0.2). Once in the shoot behavior the decorator until health = 0 requests 
the behavior to run until the enemy dies. 



Fig. 2.3 A BT example for the seek pellets behavior of Ms Pac-Man. 
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2.2.3 Utility-Based AI 

As has been pointed out by several industrial game AI developers the lack of behav- 
ioral modularity across games and in-game tasks is detrimental for the development 
of high quality AI II605III7III . An increasingly popular method for ad-hoc behav¬ 
ior authoring that eliminates the modularity limitations of FSMs and BTs is the 
utility-based AI approach which can be used for the design of control and deci- 
sion making systems in games II425115571 . Following this approach, instances in the 
game get assigned a particular utility function that gives a value for the importance 
of the particular instance lfT0lll69l . For instance, the importance of an enemy being 
present at a particular distance or the importance of an agent’s health being low in 
this particular context. Given the set of all Utilities available to an agent and all the 
options it has, utility-based AI decides which is the most important option it should 
consider at this moment M426II . The utility-based approach is grounded in the utility 
theory of economics and is based on utility function design. The approach is similar 
to the design of membership functions in a fuzzy set. 

A utility can measure anything from observable objective data (e.g., enemy 
health) to subjective notions such as emotions, mood and threat. The various Utili¬ 
ties about possible actions or decisions can be aggregated into linear or non-linear 
formulas and guide the agent to take decisions based on the aggregated utility. The 
utility values can be checked every n frames of the game. So while FSMs and BTs 
would examine one decision at a time, utility-based AI architectures examine all 
available options, assign a utility to them and select the option that is most appro- 
priate (highest utility). 

As an example of utility-based AI we will build on the one appearing in 14261 for 
weapon selection. For selecting a weapon an agent needs to consider the following 
aspects; range, inertia, random noise, ammo and indoors. The range utility function 
adds value to the utility of a weapon depending on the distance—for instance, if 
the distance is short, pistols are assigned higher utility. Inertia assigns higher utility 
value to the current weapon so that changes of weapons are not very frequent. Ran¬ 
dom noise adds non-determinism to the selection so that the agent does not always 
pick the same weapon given the same game situation. Ammo returns a utility about 
the current level of ammunition and indoors penalizes the use of particular weapons 
indoors such as a grenade through a boolean utility function (e.g., 0 utility value if 
the grenade is used indoors; 1 otherwise). Our agent makes a regular check of the 
available weapons, assigns utility scores to all of them and selects the weapon with 
the best total utility. 

Utility-based AI has certain advantages compared to other ad-hoc authoring tech- 
niques. It is modular as the decision of the game agent is dependent on a number 
of different factors (or considerations); this list of factors can be dynamic. Utility- 
based AI is also extensible as we can easily author new types of considerations as 
we see them fit. Finaliy, the method is reusable as utility components can be trans- 
fered from one decision to another and from a game to another game. As a resuit of 
these advantages utility-based AI is gradually getting traction in the game industry 
scene II557111711 . Utility-based AI has seen a widespread use across game genres 
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Fig. 2.4 A utility-based approach for controlling Ms Pac-Man behavior. The threat level (x-axis) is 
a function that lies between 0 and 1 which is based on the current position of ghosts. Ms Pac-Man 
considers the current level of threat, assigns utility values (through the three different curves) and 
decides to follow the behavior with the highest utility value. In this example the utility of ghost 
evading rises exponentially with the level of threat. The utility of ghost hunting decreases linearly 
with respect to threat up to a point where it stabilizes; it then decreases linearly as the threat level 
increases above a threshold value. Finally, the utility of pellet seeking increases linearly up to 
considerable threat level from which point it decreases exponentially. 


and has been featured, among others, in Kohan 2: Kings of War (Take Two Inter¬ 
active and Global Star Software, 2004), in Iron Man (Sega, 2008) for controlling 
the boss, in Red Dead Redemption (Rockstar Games, 2010) for weapon and dialog 
selection, and in Killzone 2 (Sony Computer Entertainment, 2009) and in F.E.A.R. 
(Sierra Entertainment, 2005) for dynamic tactical decision making 14261 . 


2.2.3.1 Utility-Based AI for Ms Pac-Man 

Once again we use Ms Pac-Man to demonstrate the use of utility-based AI. Eigure 


2.4 illustrates an example of three simple utility functions that could be considered 


by Ms Pac-Man during play. Each function corresponds to a different behavior that 
is dependent on the current threat level of the game; threat is, in turn, a function of 
current ghost positions. At any point in the game Ms Pac-Man selects the behavior 
with the highest utility value. 


2.2.3.2 A Short Note on Ad-Hoc Behavior Authoring 

It is important to remember that all three methods covered in this section (and, in 
general, the methods covered in this chapter) represent the very basic variants of the 
algorithms. As a resuit, the algorithms we covered appear as static representations 
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of States, behaviors or utility functions. It is possible, however, to create dynamic 
variants of those by adding non-deterministic or fuzzy elements; for instance, one 
may employ fuzzy transitions in an FSM or evolve behaviors in a BT. Further, it 
is important to note that these ad-hoc designed architectures can feature any of the 
methods this book covers in the remainder of this chapter. Basic processing elements 
such as an FSM state, a BT behavior or a utility function or even more complex 
hierarchies of nodes, trees or functions can be replaced by any other AI method 
yielding hybrid algorithms and agent architectures. Note that possible extensions of 
the algorithms can be found in the work we cite in the corresponding section of each 
algorithm but also in the reading list we provide next. 


2.2.4 Further Reading 

Further details on how to build and test FSMs and hierarchical FSMs can be found 
in 0621. For behavior trees we recommend the Online tutorials and blogposts of 
A. Champandard found at the http://aigamedev.com/ portal OllOI II 1 III and recent 
adaptations of the basic behavior tree structure as in II627I . Finally, the book of 
Dave Mark 11425 II is a good starting point for the study of utility-based AI and its 
application to control and decision making in games. 

When it comes to Software, a BT tool has been integrated within the Unreal En- 
gin^ while several other BT Unity tool^ are available for the interested reader. 
Further, the Behave systert0streamlines the iterative process of designing, integrat- 
ing and debugging behavior trees and utility-based AI. 


2.3 Tree Search 

It has been largely claimed that most, if not all, of artificial intelligence is really just 
search. Almost every AI problem can be cast as a search problem, which can be 
solved by finding the best (according to some measure) plan, path, model, function, 
etc. Search algorithms are therefore often seen as being at the core of AI, to the 
point that many textbooks (such as Russell and Norvig’s famous textbook 158211 ') 
start with a treatment of search algorithms. 

The algorithms presented below can all be characterized as tree search algo¬ 
rithms as they can be seen as building a search tree where the root is the node 
representing the state where the search starts. Edges in this tree represent actions 
the agent takes to get from one state to another, and nodes represent States. Because 
there are typically several different actions that can be taken in a given state, the tree 


* https://docs.unrealengine.com/latest/INT/Engine/ 

^ For instance, see http://nodecanvas.paradoxnotion.com/ or http://www.opsive.com/. 
^ http://eej.dk/community/documentation/behave/0-lntroduction.html 
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branches. Tree search algorithms mainly differ in which branches are explored and 
in what order. 


2.3.1 Uninformed Search 

Uninformed search algorithms are algorithms which search a state space without 
any further information about the goal. The basic uninformed search algorithms are 
commonly seen as fundamental computer Science algorithms, and are sometimes 
not even seen as AI. 

Depth-flrst search is a search algorithm which explores each branch as far as 
possible before backtracking and trying another branch. At every iteration of its 
main loop, depth-first search selects a branch and then moves on to explore the 
resulting node in the next iteration. When a terminal node is reached—one from 
which it is not possible to advance further—depth-first search advances up the list 
of visited nodes until it finds one which has unexplored actions. When used for 
playing a game, depth-first search explores the consequences of a single move until 
the game is won or lost, and then goes on to explore the consequences of taking a 
different move close to the end States. 

Breadth-first search does the opposite of depth-first search. Instead of exploring 
all the consequences of a single action, breadth-first search explores all the actions 
from a single node before exploring any of the nodes resulting from taking those 
actions. So, all nodes at depth one are explored before all nodes at depth two, then 
all nodes at depth three, etc. 

While the aforementioned are fundamental uninformed search algorithms, there 
are many variations and combinations of these algorithms, and new uninformed 
search algorithms are being developed. More information about uninformed search 
algorithms can be found in Chapter 4 of 05821 . 

It is rare to see uninformed search algorithms used effectively in games, but there 
are exceptions such as iterative width search l58l . which does surprisingly well in 
general video game playing, and the use of breadth-first search to evaluate aspects 
of strategy game maps in Sentient Sketchbook 03790 . Also, it is often illuminating to 
compare the performance of state-of-the-art algorithms with a simple uninformed 
search algorithm. 


2.3.1.1 Uninformed Search for Ms Pac-Man 

A depth-first approach in Ms Pac-Man would normally consider the branches of 
the game tree until Ms Pac-Man either completes the level or loses. The outcome 
of this search for each possible action would determine which action to take at a 
given moment. Breadth-first instead would first explore all possible actions of Ms 
Pac-Man at the current state of the game (e.g., going left, up, down or right) and 
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would then explore ali their resulting nodes (children) and so on. The game tree of 
either method is too big and complex to visualize within a Ms Pac-Man example. 


2.3.2 Best-First Search 

In best-flrst search, the expansion of nodes in the search tree is informed by some 
knowledge about the goal state. In general, the node that is closest to the goai state 
by some criterion is expanded first. The most well-known best-first search algorithm 
is A* (pronounced A star). The A* algorithm keeps a list of “open” nodes, which 
are next to an explored node but which have not themselves been explored. For each 
open node, an estimate of its distance from the goal is made. New nodes are chosen 
to explore based on a lowest cost basis, where the cost is the distance from the origin 
node plus the estimate of the distance to the goal. 

A* can easily be understood as navigation in two- or three-dimensional space. 
Variants of this algorithm are therefore commonly used for pathfinding in games. 
In many games, the “AI” essentially amounts to non-player characters using A* 
pathfinding to traverse scripted points. In order to cope with large, deceptive spaces 
numerous modifications of this basic algorithm have been proposed, including hier- 
archical versions of A* 1^16611 . real-time heuristic search ll82l . iump point search 
for uniform-cost grids 12461 . 3D pathfinding algorithms ||68l, planning algorithms 
for dynamic game worlds Il495l that enable the animation of crowds in collision- 
free paths 1163II and approaches for pathfinding in navigation meshes 11^17221 . The 
Work of Steve Rabin and Nathan Sturtevant on grid-based pathfinding 0551116621 
and pathfinding architectures l550l are notable examples. Sturtevant and colleagues 
have also been running a dedicated competition to grid-based path-planning 06651 
since 2012j^For the interested reader Sturtevant 16631 has released a list of bench- 
marks for grid-based pathhnding in game^ including Dragon Age: Origins (Elec¬ 
tronic Arts, 2009), StarCraft (Blizzard Entertainment, 1998) and Warcraft III: Reign 
ofChaos (Blizzard Entertainment, 2002). 

However, A* can also be used to search in the space of game States, as opposed 
to simply searching physical locations. This way, best-first search can be used for 
planning rather than just navigation. The difference is in taking the changing state 
of the World (rather than just the changing state of a single agent) into account. 
Planning with A* can be surprisingly effective, as evidenced by the winner of the 
2009 Mario AI Competition—where competitors submitted agents playing Super 
Mario Bros (Nintendo, 1985)—being based on a simple A* planner that simply 
tried to get to the right end of the screen at all times I7171l705ll (see also Eig.|2.5|l. 


^ http://movingai.com/GPPC/ 

^ http://movingai.com/benchmarks/ 
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Fig. 2.5 The A* controller of the 2009 Mario AI Competition Champion by R. Baumgarten 17051 . 
The red lines illustrate possible future trajectories considered by the A* controller of Mario, taking 
the dynamic nature of the game into account. 


2.3.2.1 Best-First Search for Ms Pac-Man 

Best-first search can be applicable in Pac-Man in the form of A*. Following the 
paradigm of the 2009 Mario AI competition Champion, Ms Pac-Man can be con- 
trolled by an A* algorithm that searches through possible game States within a short 
time frame and takes a decision on where to move next (up, down, left or right). 
The game state can be represented in various ways: from a very direct, yet costly, 
representation that takes ghost and pellet coordinates into account to an indirect rep- 
resentation that considers the distance to the closest ghost or pellet. Regardless of 
the representation chosen, A* requires the design of a cost function that will drive 
the search. Relevant cost functions for Ms Pac-Man would normally reward moves 
to areas containing pellets and penalizing areas containing ghosts. 


2.3.3 Minimax 

For single-player games, simple uninformed or informed search algorithms can be 
used to find a path to the optimal game state. However, for two-player adversarial 
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games, there is another player that tries to win as well, and the actions of each 
player depend very much on the actions of the other player. For such games we 
need adversarial search, which includes the actions of two (or more) adversarial 
players. The basic adversarial search algorithm is called Minimax. This algorithm 
has been used very successfully for playing classic perfect-information two-player 
board games such as Checkers and Chess, and was in fact (re)invented specihcally 
for the purpose of building a Chess-playing program 17251 . 

The core loop of the Minimax algorithm alternates between player 1 and player 
2—such as the white and black player in Chess—named the min and the max player. 
For each player, all possible moves are explored. For each of the resulting States, 
all possible moves by the other player are also explored, and so on until all the 
possible combinations of moves have been explored to the point where the game 
ends (e.g., with a win, a loss or a draw). The resuit of this process is the generation 
of the whole game tree from the root node down to the leaves. The outcome of the 
game informs the utility function which is applied onto the leaf nodes. The utility 
function estimates how good the current game conhguration is for a player. Then, the 
algorithm traverses up the search tree to determine what action each player would 
have taken at any given state by backing-up values from leaves through the branch 
nodes. In doing so, it assumes that each player tries to play optimally. Thus, from 
the standpoint of the max player, it tries to maximize its score, whereas min tries to 
minimize the score of max\ hence, the name Minimax. In other words, a max node of 
the tree computes the max of its child values whereas a min node computes the min 
of its child values. The optimal winning strategy is then obtained for max if, on min’s 
tum, a win is obtainable for max for all moves that min can make. The corresponding 
optimal strategy for min is when a win is possible independently of what move max 
will take. To obtain a winning strategy for max, for instance, we start at the root of 
the tree and we iteratively choose the moves leading to child nodes of highest value 
(on min’s tum the child nodes with the lowest value are selected instead). Figure [23] 
illustrates the basic steps of Minimax through a simple example. 

Of course, exploring all possible moves and countermoves is infeasible for any 
game of interesting complexity, as the size of the search tree increases exponentially 
with the depth of the game or the number of moves that are simulated. Indicatively, 
tic-tac-toe has a game tree size of 9! = 362,880 States which is feasible to traverse 
through; however, the Chess game tree has approximately lO*^'^ nodes which is 
infeasible to search through with modern computers. Therefore, almost all actual 
applications of the Minimax algorithm cut off search at a given depth, and use a state 
evaluation function to evaluate the desirability of each game state at that depth. For 
example, in Chess a simple state evaluation function would be to merely sum the 
number of white pieces on the board and subtract the number of black pieces; the 
higher this number is, the better the situation is for the white player. (Of course, 
much more sophisticated board evaluation functions are commonly used.) Together 
with improvements to the basic Minimax algorithm such as a-j3 pruning and the 
use of non-deterministic state evaluation functions, some very competent programs 
emerged for many classic games (e.g., IBM’s Deep Blue). More information about 
Minimax and other adversarial search algorithms can be found in Chapter 6 of 05 821 . 
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Fig. 2.6 An abstract game tree illustrating the Minimax algorithm. In this hypothetical game of 
two options for each player max (represented as red squares) plays first, min (represented as blue 
diamonds) plays second and then max plays one last time. White squares denote terminal nodes 
containing a winning (positive), a losing (negative) or a draw (zero) score for the max player. 
Following the Minimax strategy, the scores (utility) are traversed up to the root of the game tree. 
The optimal play for max and min is illustrated in bold. In this simple example if both players play 
optimally, max wins a score of 5. 


2.3.3.1 Minimax for Ms Pac-Man 

Strictly speaking, Minimax is not applicable to Ms Pac-Man as the game is non- 
deterministic and, thus, the Minimax tree is formally unknown. (Of course Minimax 
variants with heuristic evaluation functions can be eventually applicable.) Minimax 
is however applicable to Ms Pac-Man’s deterministic ancestor, Pac-Man (Namco, 
1980). Again strictly speaking, Pac-Man is a single-player adversarial game. As 
such Minimax is applicable only if we assume that Pac-Man plays against adver- 
saries (ghosts) who make optimal decisions. It is important to note that ghosts’ 
movements are not represented by tree nodes; instead, they are simulated based on 
their assumed optimal play. Game tree nodes in Pac-Man may represent the game 
state including the position of Pac-Man, the ghosts, and the current pellets and power 
pilis available. The branches of the Minimax tree are the available moves of the Pac- 
Man in each game state. The terminal nodes can, for instance, feature either a binary 
utility (1 if Pac-Man completes the level; 0 if Pac-Man was killed by a ghost) or the 
final score of the game. 
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2.3.4 Monte Carlo Tree Search 


There are many games which Minimax will not play well. In particular, games with 
a high branching factor (where there are many potential actions to take at any given 
point in time) lead to Minimax that will only ever search a very shallow tree. An- 
other aspect of games which frequently throws spanners in the works of Minimax 
is when it is hard to construet a good state evaluation function. The board game Go 
is a deterministic, perfect information game that is a good example of both of these 
phenomena. Go has a branching factor of approximately 300, whereas Chess typi- 
cally has around 30 actions to choose from. The positional nature of the Go game, 
which is all about surrounding the adversary, makes it very hard to correctly esti- 
mate the value of a given board state. For a long time, the best Go-playing programs 
in the world, most of which were based on Minimax, could barely exceed the play- 
ing strength of a human beginner. In 2007, Monte Carlo Tree Search (MCTS) was 
invented and the playing strength of the best Go programs increased drastically. 

Beyond complex perfect information, deterministic games such as Go, Chess and 
Checkers, imperfect information games such a Battleship, Poker, Bridge and/or 
non-deterministic games such as backgammon and monopoly cannot be solved via 
Minimax due to the very nature of the algorithm. In such games, MCTS not only 
overcomes the tree size limitation of Minimax but, given sufficient computation, it 
approximates the Minimax tree of the game. 

So how does MCTS handle high branching factors, lack of good state evaluation 
functions, and lack of perfect information and determinism? To begin with, it does 
not search all branches of the search tree to an even depth, instead it concentrates 
on the more promising branches. This makes it possible to search certain branches 
to a considerable depth even though the branching factor is high. Further, to get 
around the lack of good evaluation functions, determinism and imperfect informa¬ 
tion, the Standard formulation of MCTS uses rollouts to estimate the quality of the 
game state, randomly playing from a game state until the end of the game to see the 
expected win (or loss) outcome. The utility values obtained via the random simu- 
lations may be used efficiently to adjust the policy towards a best-first strategy (a 
Minimax tree approximation). 

At the start of a run of the MCTS algorithm, the tree consists of a single node rep- 
resenting the current state of the game. The algorithm then iteratively builds a search 
tree by adding and evaluating new nodes representing game States. This process can 
be interrupted at any time, rendering MCTS an anytime algorithm. MCTS requires 
only two pieces of information to operate; the game mles that would, in turn, yield 
the available moves in the game and the tenninal state evaluation —whether that is 
win, a loss, a draw, or a game score. The vanilla version of MCTS does not require 
a heuristic function, which is, in turn, a key advantage over Minimax. 

The core loop of the MCTS algorithm can be divided into four steps: Selection, 
Expansion (the first two steps are also known as tree policy), Simulation and Back- 


propagation. The steps are also depicted in Fig. 2.7 
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Selection; In this phase, it is decided which node should be expanded. The 
process starts at the root of the tree, and continues until a node is selected 
which has unexpanded children. Every time a node (action) is to be selected 
within the existing tree a child node j is selected to maximise the UCBl 
formula: 



21nn 


UCBl =Xj+2Cp 


( 2 . 1 ) 


where Xj is the average reward of all nodes beneath this node, Cp is an ex- 
ploration constant (often set to 1 n is the number of times the parent 
node has been visited, and nj is the number of times the child node j has 
been visited. It is important to note that while UCBl is the most popular for¬ 
mula used for action selection it is certainly not the only one available. Be- 
yond equation IU) other options include epsilon-greedy, Thompson sam- 
pling, and Bayesian bandits. For instance, Thompson sampling selects ac- 
tions stochastically based on their posterior probabilities of being optimal 


IMI- 


Expansion: When a node is selected that has unexpanded children—i.e., that 
represents a state from which actions can be taken that have not been at- 
tempted yet—one of these children is chosen for expansion, meaning that a 
simulation is done starting in that state. Selecting which child to expand is 
often done at random. 

Simulation (Default Policy): After a node is expanded, a simulation (or roll- 
out) is done starting from the non-terminal node that was just expanded until 
the end of game to produce a value estimate. Usually, this is performed by 
taking random actions until a termination state is reached, i.e., until the game 
is either won or lost. The state at the end of the game (e.g., —1 if losing, +1 
if winning, but could be more nuanced) is used as the reward (A) for this 
simulation, and propagated up the search tree. 

Backpropagation: The reward (the outcome of the simulation) is added to 
the total reward X of the new node. It is also “backed up”: added to the total 
reward of its parent node, its parent’s parent and so on until the root of the 
tree. 


The simulation step might appear counter-intuitive—taking random actions seems 
like no good way to play a game—but it provides a relatively unbiased estimate of 
the quality of a game state. Essentially, the better a game state is, the more simu- 
lations are likely to end up winning the game. At least, this is true for games like 
Go where a game will always reach a terminal state within a certain relatively small 
number of moves (400 for Go). For other games like Chess, it is theoretically pos- 
sible to play an arbitrary number of moves without winning or losing the game. 
For many video games, it is probable that any random sequence of actions will not 
end the game unless some timer runs out, meaning that most simulations will be 
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Fig. 2.7 The four basic steps of MCTS exemplified through one iteration of the algorithm. The 
figure is a recreation of the corresponding MCTS outline figure by Chaslot et al. fTTSl . 


very long (tens or hundreds of thousands of steps) and not yield usefal information. 
For example, in Super Mario Bros (Nintendo, 1985), the application of random ac- 
tions would most likely make Mario dance around his starting point until his time is 
up 129411 . In many cases it is therefore useful to complement the simulation step with 
a state evaluation function (as commonly used in Minimax), so that a simulation is 
performed for a set number of steps and if a terminal state is not reached a state 
evaluation is performed in lieu of a win-lose evaluation. In some cases it might even 
be benehcial to replace the simulation step entirely with a state evaluation function. 

It is worth noting that there are many variations of the basic MCTS algorithm—it 
may in fact be more useful to see MCTS as an algorithm family or framework rather 
than a single algorithm. 


2.3.4.1 MCTS for Ms Pac-Man 

MCTS can be applicable to the real-time control of the Ms Pac-Man agent. There 
are obviously numerous ways to represent a game state (and thereby a game tree 
node) and design a reward function for the game, which we will not discuss in detail 
here. In this section, instead, we will outline the approach followed by Pepels et al. 
15241 given its success in obtaining high scores forMs Pac-Man. Their agent, named 
Maastricht, managed to obtain over 87,000 points and was ranked hrst (among 36 
agents) in the Ms Pac-Man competition of the IEEE Computational Intelligence and 
Games conference in 2012. 

When MCTS is used for real-time decision making a number of challenges be- 
come critical. Eirst, the algorithm has limited rollout computational budget which 
increases the importance of heuristic knowledge. Second, the action space can be 
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Fig. 2.8 The junction-based representation of a game state for the Maastricht MCTS controller 
15241 . AU letter nodes refer to game tree nodes (decisions) for Ms Pac-Man. Imaged adapted from 
|524| with permission from authors. 


paiticularly fine-grained which suggests that macro-actions are a more powerful 
way to model the game tree; otherwise the agent’s planning will be very short-term. 
Third, there might be no terminal node in sight which calls for good heuristics and 
possibly restricting the simulation depth. The MCTS agent of Pepels et al. lEU 
managed to cope with ali the above challenges of using MCTS for real-time control 
by using a restricted game tree and a junction-based game state representation (see 
Fig-IO). 
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2.3.5 Further Reading 

The basic search algorithms are well covered in Russell and Norvig’s classic AI 
textbook 05821 . The A* algorithm was invented in 1972 for robot navigation 02471 : 
a good description of the algorithm can be found in Chapter 4 of 05 821 . There is 
plenty of more advanced material on tailoring and optimizing this algorithm for 
specific game problems in dedicated game AI books such as 05461 . The different 
components of Monte Carlo tree search Gm were invented in 2006 and 2007 in 
the context of playing Go 01421 : a good overview of and introduction to MCTS and 
some of its variants is given in a survey paper by Browne et al. OTTlI . 


2.4 Evolutionary Computation 


While tree search algorithms start from the root node representing an origin state, 
and build a search tree based on the available actions, optimization algorithms do 
not build a search tree: they only consider complete Solutions, and not the path 
taken to get there. As mentioned earlier in Secti on|2T| all optimization algorithms 
assume that there is something to optimize Solutions for; there must be an objective, 
alternatively called utility function, evaluation function or fitness function, which 
can assign a numerical value (the fitness) to a solution, which can be maximized (or 
minimized). Given a utility function, an optimization algorithm can be seen as an 
algorithm that seeks in a search space Solutions that have the highest (or lowest) 
value of that utility. 

A broad family of optimization algorithms is based on randomized variation of 
Solutions, where one or multiple Solutions are kept at any given time, and new So¬ 
lutions (or candidates, or search points; different terminology is used by different 
authors) are created through randomly changing some of the existing Solutions, or 
maybe combining some of them. Randomized optimization algorithms which keep 
multiple Solutions are called evolutionary algorithms, by analogy with natural evo- 
lution. 

Another important concept when talking about optimization algorithms (and AI 


at large as covered in Section 2.1 1 is their representation. All Solutions are repre- 
sented in some way, for example, as fixed-size vectors of real numbers, or variable- 
length strings of characters. Generally, the same artifact can be represented in many 
different ways; for example, when searching for a sequence of actions that solves a 
maze, the action sequence can be represented in several different ways. In the most 
direct representation, the character at step t determines what action to take at time 
step f +1. A somewhat more indirect representation for a sequence of actions would 
be a sequence of tuples, where the character at time step t decides what action to 
take and the number t +n determines for how many time steps n to take that action. 
The choice of representation has a big impact on the efficiency and efficacy of the 
search algorithm, and there are several tradeoffs at play when making these choices. 
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Optimization is an extremely general concept, and optimization algorithms are 
useful for a wide variety of tasks in AI as well as in computing more generally. 
Within AI and games, optimization algorithms such as evolutionary algorithms have 
been used in many roles as well. In Chapter |^we explain how optimization algo¬ 
rithms can be used for searching for game-playing agents, and also for searching for 
action sequences (these are two very different uses of optimization that are both in 
the context of game-playing); in Chapter |^we explain how we can use optimiza¬ 
tion to create game content such as levels; and in Chapter|^we discuss how to use 
optimization to find player models. 


2.4.1 LocalSearch 

The simplest optimization algorithms are the local optimization algorithms. These 
are so called because they only search “locally”, in a small part of the search space, 
at any given time. A local optimization algorithm generally just keeps a single solu- 
tion candidate at any given time, and explores variations of that solution. 

The arguably simplest possible optimization algorithm is the hili cUmber. In 
its most common formulation, which we can call the deterministic formulation, it 
Works as follows: 


1. Initialization: Create a solution s by choosing a random point in search 
space. Evaluate its fitness. 

2. Generate all possible neighbors of s. A neighbor is any solution that differs 
from s by at most a certain given distance (for example, a change in a single 
position). 

3. Evaluate all the neighbors with the fitness function. 

4. If none of the neighbors has a better fitness score than s, exit the algorithm 
and return s. 

5. Otherwise, replace s with the neighbor that has the highest fitness value and 
go to step 2. 


The deterministic hili climber is only practicable when the representation is such 
that each solution has a small number of neighbors. In many representations there 
are an astronomically high number of neighbors. It is therefore preferable to use 
variants of hili climbers that may guide the search effectively. One approach is the 
gradlent-based blll climber that follows the gradient towards minimizing a cost 
function. That algorithmic approach trains artificial neural networks for instance 
(see Section |23- Another approach that we cover here is the randomized bili 
climber. This instead relies on the concept of mutation; a small, random change 
to a solution. Eor example, a string of letters can be mutated by randomly flipping 
one or two characters to some other character (see Eig. |2.9|l, and a vector of real 
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00111010010 

00011011010 


(a) Mutation: A number of genes is selected to 
be mutated with a small probability e.g., less 
than 1%. The selected genes are highlighted 
with a red outline at the top chromosome and 
are mutated by flipping their binary value (red 
genes) at the bottom chromosome. 


00111010010 

00010111010 


(b) Inversioni Two positions in the offspring 
are randomly chosen and the positions between 
them—the gene sequence highlighted by a red 
outline at the top chromosome—are inversed 
(red genes) at the bottom chromosome. 


Fig. 2.9 Two ways of mutating a binary chromosome. In this example we use a chromosome of 
eleven genes. A chromosome is selected (top bit-string) and mutated (bottom bit-string). 


numbers can be mutated by adding another vector to it drawn from a random dis¬ 
tributiori around zero, and with a very small Standard deviation. Macro-mutations 
such as gene inversion can also be applied as visualized in Fig. 2.9 Given a repre- 
sentation, fitness function and mutation operator, the randomized hili climber works 
as follows: 


1. Initialization: Create a solution s by choosing a random point in the search 
space. Evaluate its fitness. 

2. Mutation: Generate an offspring s' by mutating s. 

3. Evaluation: Evaluate the fitness of s'. 

4. Replacement: If s' has higher fitness than s, replace s with s'. 

5. Go to step 2. 


While very simple, the randomized hili climber can be surprisingly effective. Its 
main limitation is that it is liable to get stuck in local optima. A local optimum 
is sort of a “dead end” in search space from which there is “no way out”; a point 
from which there are no better (higher-fit) points within the immediate vicinity. 
There are many ways of dealing with this problem. One is to simply restart the hili 
climber at a new randomly chosen point in the search space whenever it gets stuck. 
Another is simulated annealing, to accept moving to Solutions with lower fitness 
with a given probability; this probability gradually diminishes during the search. A 
far more popular response to the problem of local optima is to keep not just a single 
solution at any time, but a population of Solutions. 


2.4.1.1 Local Search for Ms Pac-Man 

While we can think of a few ways one can apply local search in Ms Pac-Man we 
outline an example of its use for controlling path-plans. Local search could, for 
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instance, evolve short local pians (action sequences) of Ms Pac-Man. A solution 
could be represented as a set of actions that need to be taken and its fitness could be 
determined by the score obtained after following this sequence of actions. 


2.4.2 Evolutionary Algorithms 

Evolutionary algorithms are randomized global optimization algorithms; they are 
called global rather than local because they search many points in the search space 
simultaneously, and these points can be far apart. They accomplish this by keeping a 
population of Solutions in memory at any given time. The general idea of evolution¬ 
ary computation is to optimize by “breeding” Solutions: generate many Solutions, 
throw away the bad ones and keep the good (or at least less bad) ones, and create 
new Solutions from the good ones. 

The idea of keeping a population is taken from Darwinian evolution by natural 
selection, from which evolutionary algorithms also get their name. The size of the 
population is one of the key parameters of an evolutionary algorithm; a population 
size of 1 yields something like a randomized hili climber, whereas populations of 
several thousand Solutions are not unheard of 

Another idea which is taken from evolution in nature is crossover, also called re- 
combination. This is the equivalent of sexual reproduction in the natural world; two 
or more Solutions (called parents) produce an offspring by combining elements of 
themselves. The idea is that if we take two good Solutions, a solution that is a com- 
bination of these two—or intermediate between them—ought to be good as well, 
maybe even better than the parents. The offspring operator is highly dependent on 
the solution representation. When the solution is represented as a string or a vec¬ 
tor, operators such as uniform crossover (which flips a fair coin and randomly picks 
values from each parent for each position in the offspring) or one-point crossover 
(where a position p in the offspring is randomly chosen, and values of positions be- 
fore p are taken from parent 1 and values of positions after p are taken from parent 
2) can be used. Crossover can be applied to any chromosome representation varying 
from a bit-string to a real-valued vector. Figure [2T0| illustrates these two crossover 
operators. It is in no way guaranteed, however, that the crossover operator generates 
an offspring that is anything as highly fit as the parents. In many cases, crossover can 
be highly destructive. If crossover is used, it is therefore important that the offspring 
operator is chosen with care for each problem. Figure [ZTTI illustrates this possibility 
through a simple two-dimensional example. 

The basic template for an evolutionary algorithm is as follows: 


1. Initialization: The population is filled with N Solutions created randomly, 
i.e., random points in search space. Known highly-fit Solutions can also be 
added to this initial population. 
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P 


(a) 1-point crossover: The vertical line across 
the two parents denotes the crossover point at 
position p. 



(b) Uniform crossover: To select genes from 
each parent to form offspring the operator flips 
a fair coin at each position of the chromosome. 


Fig. 2.10 Two popular types of crossover used in evolutionary algorithms. In this example we 
use a binary representation and a chromosome size of eleven genes. The two bit-strings used in 
both crossover operators represent the two parents selected for recombination. Red and blue genes 
represent the two different offspring emerged from each crossover operator. Note that the operators 
are directly applicable to real-valued (floating point) representations too. 


2. Evaluation: The fitness function is used to evaluate all Solutions in the pop- 
ulation and assign fitness values to them. 

3. Parent selection: Based on fitness and possibly other criteria, such as dis- 
tance between Solutions, those population members that will be used for 
reproduction are selected. Selection strategies include methods directly or 
indirectly dependent on the fitness of the Solutions, including roulette-wheel 
(proportionally to fitness), ranking (proportionally to rank in population) and 
tourn ament. 

4. Reproduction: Offspring are generated through crossover from parents, or 
through simply copying parent Solutions, or some combination of these. 

5. Variation: Mutation is applied to some or all of the parents and/or offspring. 

6. Replacement: In this step, we select which of the parents and/or offspring 
will make it to the next generation. Popular replacement strategies of the 
current population include the generational (parents die; offspring replace 
them), steady state (offspring replaces worst parent if and only if offspring 
is better) and elitism (generational, but best x% of parents survive) ap- 
proaches. 

7. Termination: Are we done yet? Decide based on how many generations or 
evaluations have elapsed (exhaustion), the highest fitness attained by any 
solution (success), and/or some other termination condition. 

8. Go to step 2. 


Every iteration of the main loop (i.e., every time we reach step 2) is called a gen¬ 
eration, keeping with the nature-inspired terminology. The total number of fitness 
evaluations performed is typically proportional to the size of the population times 
the number of generations. 

This high-level template can be implemented and expanded in a myriad different 
ways; there are thousands of evolutionary or evolution-like algorithms out there, and 
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Fig. 2.11 An illustration of the mutation and crossover operators in a simplifled two-dimensional 
fitness landscape. The problem is represented by two real-valued variables (xi and X 2 ) that define 
the two genes of the vector chromosome. The fitness landscape is represented by the contour lines 
on the 2D plane. Chromosomes 1 and 2 are selected to be parents. They are recombined via 1-point 
crossover (dotted arrows) which yields offspring 3 and 4. Both offspring are mutated (solid arrows) 
to yield Solutions 5 and 6. Operators that lead to poorer-fit or higher-fit Solutions are, respectively, 
depicted with green and red color. 


many of them rearrange the overall flow, add new steps and remove existing steps. 
In order to make this template a bit more concrete, we will give a simple example of 
a working evolutionary algorithm below. This is a form of evolution strategy, one 
of the main families of evolutionary algorithms. While the ji+X evolution strategy 
is a simple algorithm that can be implemented in 10 to 20 lines of code, it is a 
fully functional global optimizer and quite useful. The two main parameters are ji, 
which signifies the “elite” or the size of the part of the population that is kept every 
generation, and X, the size of the part of the population that is re-generated every 
generation. 


1. Fili the population with p + X randomly generated Solutions. 

2. Evaluate the fitness of all Solutions. 
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3. Sort the population by decreasing fitness, so that the lowest-numbered Solu¬ 
tions have highest fitness. 

4. Remove the least fit X individuals. 

5. Replace the removed individuals with copies of the p best individuals. 

6. Mutate the offspring. 

7. Stop if success or exhaustion. Otherwise go to step 2. 


Evolution strategies, the type of algorithms which the p4-X evolution strategy 
above is a simple example of, are characterized by a reliance on mutation rather 
than crossover to create variation, and by the use of self-adaptation to adjust mu¬ 
tation parameters (though that is not part of the simple algorithm above). They are 
also generally well suited to optimize artifacts represented as vectors of real num- 
bers, so-called continuous optimization. Some of the very best algorithms for con- 
tinuous optimization, such as the covariance matrix adaptation evolution strategy 
(CMA-ES) 12451 and the natural evolution strategy (NES) 07531 . are conceptual 
descendants of this family of algorithms. 

Another prominent family of evolutionary algorithms is genetic algorithms 
(GAs). These are characterized by a reliance on crossover rather than mutation for 
variation (some genetic algorithms have no mutation at all), fitness-proportional se- 
lection and Solutions being often represented as bit-strings or other discrete strings. 
It should be noted, however, that the distinctions between different types of evolu¬ 
tionary algorithms are mainly based on their historical origins. These days, there are 
so many variations and such extensive hybridization that it often makes little sense 
to categorize a particular algorithm as belonging to one or the other family. 

A variant of evolutionary algorithms emerges from the need of satisfying par¬ 
ticular constraints within which a solution is not only fit but also feasible. When 
evolutionary algorithms are used for constrained optimization we are faced with a 
number of challenges such as that mutation and crossover cannot preserve or guar- 
antee the feasibility of a solution. It may very well be that a mutation or a recombi- 
nation between two parents may yield an infeasible offspring. One approach to deal 
with constraint handling is repair, which could be any process that tums infeasible 
individuals into feasible ones. A second approach is to modify the genetic opera- 
tors so that the probability of an infeasible individual to appear becomes smaller. 
A popular approach is to merely penalize the existence of infeasible Solutions by 
assigning them low fitness values or, alternatively, in proportion to the number of 
constraint violations. This strategy however may over-penalize the actual fitness of 
a solution which in turn will resuit in its rapid elimination from the population. Such 
a property might be undesirable and is often accused for the weak performance of 
evolutionary algorithms on handling constraints 114561 . As a response to this limi- 
tation the feasible-infeasible 2-population (EI-2pop) algorithm M3411 evolves two 
populations, one with feasible and one with infeasible Solutions. The infeasible pop¬ 
ulation optimizes its members towards minimizing the distance from feasibility. As 
the infeasible population converges to the border of feasibility, the likelihood of dis- 
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covering new feasible individuals increases. Feasible offspring of infeasible parents 
are transferred to the feasible population, boosting its diversity (and vice versa for 
infeasible offspring). FI-2pop has been used in games on instances where we require 
fit and feasible Solutions such as well-designed and playable game levels 0649113791 . 

Finally, another blend of evolutionary algorithms considers more than one ob- 
jective when attempting to find a solution to a problem. For many problems it is 
hard to combine all requirements and specifications into a single objective mea- 
sure. It is also often true that these objectives are conflicting; for instance, if our 
objectives are to buy the fastest and cheapest possible laptop we will soon realize 
the two objectives are partially conflicting. The intuitive solution is to merely add 
the different objective values—as a weighted sum—and use this as your fitness un¬ 
der optimization. Doing so, however, has several drawbacks such as the non-trivial 
ad-hoc design of the weighting among the objectives, the lack of insight on the inter- 
actions between the objectives (e.g., what is the price threshold above which faster 
laptops are not more expensive?) and the fact that a weighted-sum single-objective 
approach cannot reach Solutions that achieve an optimal compromise among their 
weighted objectives. The response to these limitations is the family of algorithms 
known as multiobjective evolutionary algorithms. A multiobjective evolutionary 
algorithm considers at least two objective functions—that are partially conflicting— 
and searches for a Pareto front of these objectives. The Pareto front contains Solu¬ 
tions that cannot be improved in one objective without worsening in another. Further 
details about multiobjective optimization by means of evolutionary algorithms can 
be found in 11261 . The approach is applicable in game AI on instances where more 
than one objective is relevant for the problem we attempt to solve; for instance, we 
might wish to optimize both the balance and the asymmetry of a strategy game map 
HtTHItTI, or design non-player characters that are interestingly diverse in their 
behavioral space 0. 


2.4.2.1 Evolutionary Algorithms for Ms Pac-Man 

A simple way to employ evolutionary algorithms (EAs) in Ms Pac-Man is as fol- 
lows. You could design a utility function based on a number of important parame- 
ters Ms Pac-Man must consider for taking the right decision on where to move next. 
These parameters, for instance, could be the current placement of ghosts, the pres- 
ence of power pilis, the number of pellets available on the level and so on. The next 
step would be to design a utility function as the weighted sum of these parameters. 
At each junction, Ms Pac-Man would need to consuit its utility function for all its 
possible moves and pick the move with the highest utility. The weights of the utility 
function are unknown of course and this is where an EA can be of help by evolving 
the weights of the utility so that they optimize the score for Ms Pac-Man. In other 
words, the fitness of each chromosome (weight vector of utility) is determined by 
the score obtained from Ms Pac-Man within a number of simulation steps, or game 
levels played. 
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2.4.3 Further Reading 

We recommend three books for further reading on evolutionary computation: Eiben 
and Smith’s Introduction to Evolutionary Computing CMl, Ashlock’s Evolutionary 
Computation for Modeling and Optimization and finally, the genetic program- 

ming field guide by Poli et al. 15361 . 


2.5 Supervised Learning 

Supervised learning is the algorithmic process of approximating the underlying 
function between labeled data and their corresponding attributes or features 1491 . 
A popular example of supervised learning is that of a machine that is asked to dis- 
tinguish between apples and pears (labeled data) given a set of features or data 
attributes such as the fruits’ color and size. Initially, the machine learns to classify 
between apples and pears by seeing a number of available fruit examples—which 
contain the color and size of each fmit, on one hand, and their corresponding label 
(apple or pear) on the other. After learning is complete, the machine should ideally 
be able to teli whether a new and unseen fruit is a pear or an apple based solely on its 
color and size. Beyond distinguishing between apples and pears supervised learning 
nowadays is used in a plethora of applications including financial Services, medical 
diagnosis, fraud detection, web page categorization, image and speech recognition, 
and User modeling (among many). 

Evidently, supervised learning requires a set of labeled training examples; hence 
supervised. More specifically, the training signal comes as a set of supervised labeis 
on the data (e.g., this is an apple whereas that one is a pear) which acts upon a set 
of characterizations of these labeis (e.g., this apple has red color and medium size). 
Consequently, each data example comes as a pair of a set of labeis (or outputs) and 
features that correspond to these labeis (or inputs). The ultimate goal of supervised 
learning is not to merely learn from the input-output pairs but to derive a function 
that approximates (better, imitates) their relationship. The derived function should 
be able to map well to new and unseen instances of input and output pairs (e.g., un¬ 
seen apples and pears in our example), a property that is called generalization. Here 
are some examples of input-output pairs one can meet in games and make supervised 
learning relevant: {player health, own health, distance to player} —{action (shoot. 
Ilee, idle)}; {player’s previous position, player’s current position} —{player’s next 
position}; {number of kills and headshots, ammo spent} —(skill rating}; (score, 
map explored, average heart rate} —{level of player frustrationj; (Ms Pac-Man 
and ghosts position, pellets available} —{Ms Pac-Man direction}. 

Formally, supervised learning attempts to derive a function f : X ^ Y, given a 
set of N training examples {(xi,yi),..., (x^r,yA?)}; where X and Y is the input and 
output space, respectively; x, is the feature (input) vector of the /-th example and y, 
is its corresponding set of labeis. A supervised learning task has two core steps. In 
the first training step, the training samples—attributes and corresponding labeis— 
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are presented and the function / between attributes and labeis is derived. As we will 
see in the list of algorithms below / can be represented as a number of classification 
rules, decision trees, or mathematical formulae. In the second testing step / can 
be used to predict the labeis of unknown data given their attributes. To validate the 
generalizability of / and to avoid overhtting to the data 1491 . it is common practice 
that / is evaluated on a new independent (test) dataset using a performance measure 
such as accuracy, which is the percentage of test samples that are correctly predicted 
by our trained function. If the accuracy is acceptable, we can use / to predict new 
data samples. 

But how do we derive this / function? In general, an algorithmic process modihes 
the parameters of this function so that we achieve a good match between the given 
labeis of our training samples and the function we attempt to approximate. There 
are numerous ways to find and represent that function, each one cotresponding to 
a different supervised leaming algorithm. These include artihcial neural networks, 
case-based reasoning, decision tree learning, random forests, Gaussian regression, 
naive Bayes classihers, k-nearest neighbors, and support vector machines l49l . The 
variety of supervised leaming algorithms available is, in part, explained by the fact 
that there is no single learning algorithm that works best on all supervised leaming 
problems out there. This is widely known as the nofree lunch theorem I756L 

Before covering the details of particular algorithms we should stress that the data 
type of the label determines the output type and, in turn, the type of the super¬ 
vised leaming approach that can be applied. We can identify three main types of 
supervised learning algorithms depending on the data type of the labeis (outputs). 
First, we meet classification B9l algorithms which attempt to predict categorical 
class labeis (discrete or nominal) such as the apples and pears of the previous ex- 
ample or the level in which a player will achieve her maximum score. Second, if 
the output data comes as an interval—such as the completion time of a game level 
or retention time—the supervised learning task is metric regression Il49l . Finally, 
preference learning M215I predicts ordinal outputs such as ranks and preferences 
and attempts to derive the underlying global order that characterizes those ordinal 
labeis. Examples of ordinal outputs include the ranked preferences of variant cam¬ 
era viewpoints, or a preference of a particular sound effect over others. The training 
signal in the preference learning paradigm provides information about the relative 
relation between instances of the phenomenon we attempt to approximate, whereas 
regression and classihcation provide information, respectively, about the intensity 
and the classes of the phenomenon. 

In this book, we focus on a subset of the most promising and popular super¬ 
vised learning algorithms for game AI tasks such as game playing (see Chapter]^, 
player behavior imitation or player preference prediction (see Chapter]^. The three 
algorithms outlined in the remainder of this section are artihcial neural networks, 
support vector machines and decision tree learning. All three supervised leaming 
algorithms covered can be used for either classihcation, prediction or preference 
learning tasks. 
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Fig. 2.12 An illustration of an artificial neuron. The neuron is fed with the input vector x through 
n connections with corresponding weight values w. The neuron processes the input by calculating 
the weighted sum of inputs and corresponding connection weights and adding a hias weight {b)\ 
x-MV + b. The resulting formula feeds an activation function (g), the value of which detines the 
output of the neuron. 


2.5.1 Artificial Neural Networks 


Artificial Neural Networks (ANNs) are a bio-inspired approach for computational 
intelligence and machine learning. An ANN is a set of interconnected processing 
units (named neurons) which was originally designed to model the way a biolog- 
ical brain—containing over 10^* neurons—processes information, operates, leams 
and performs in several tasks. Biological neurons have a cell body, a number of den- 
drites which bring information into the neuron and an axon which transmits elec- 
trochemical information outside the neuron. The artificial neuron (see Fig. 2.12| i 
resembles the biological neuron as it has a number of inputs x (corresponding to 
the neuron dendrites) each with an associated weight parameter w (corresponding 
to the synaptic strength). It also has a processing unit that combines inputs with 
their corresponding weights via an inner product (weighted sum) and adds a hias 
(or threshold) weight b to the weighted sum as follows: x - w + h. This value is then 
fed to an activation function g (cell body) that yields the output of the neuron (cor¬ 
responding to an axon terminal). ANNs are essentially simple mathematical models 
defining a function /: x —?> y. 

Various forms of ANNs are applicable for regression analysis, classification, 
and preference learning, and even unsupervised learning (via e.g., Hebbian learning 
II256I and self-organizing maps OTTIH . Core application areas include pattern recog- 
nition, robot and agent control, game-playing, decision making, gesture, speech and 
text recognition, medical and financial applications, affective modeling, and im- 
age recognition. The benefits of ANNs compared to other supervised learning ap- 
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proaches is their capacity to approximate any continuous real-valued function given 
sufficiently large ANN architectures and computational resources 0348111521 . This 
capacity characterizes ANNs as universal approximators 02790 . 


2.5.1.1 Activation Functions 


Which activation function should one use in an ANN? The original model of a 
neuron by McCulloch and Pitts 04501 in 1943 featured a Heaviside step activation 
function which either allows the neuron to fire or not. When such neurons are em- 
ployed and connected to a multi-layered ANN the resulting network can merely 
solve linearly separable problems. The algorithm that trains such ANNs was in- 
vented in 1958 05760 and is known as the Rosenblatt’s perceptron algorithm. Non- 
linearly separable problems such as the exclusive-or gate could only be solved after 
the invention of the backpropagation algorithm in 1975 07521 . Nowadays, there 
are several activation functions used in conjunction with ANNs and their train- 
ing. The use of the activation function, in turn, yields different types of ANNs. 
Examples include Gaussian activation function that is used in radial basis function 
(RBF) networks GIl and the numerous types of activation functions that can be 
used in the compositional pattern producing networks (CPPNs) 06531 . The most 
common function used for ANN training is the sigmoid-shaped logistic function 
{g{x) = 1/(1 +e^^)) for the following properties: 1) it is bounded, monotonic and 
non-linear; 2) it is continuous and smooth and 3) its derivative is calculated trivially 
as g'{x) = g(x)(l — g(x)). Given the properties above the logistic function can be 
used in conjunction with gradient-based optimization algorithms such as backprop¬ 
agation which is described below. Other popular activation functions for training 
deep architectures of neural networks include the rectifler —named rectifled lin- 
ear unit (ReLU) when employed to a neuron—and its smooth approximation, the 
softplus function 02311 . Compared to sigmoid-shaped activation functions, ReLUs 
allow for faster and (empirically) more effective training of deep ANNs, which are 
generally trained on large datasets (see more in Section 2.5.1.6|l. 


2.5.1.2 From a Neuron to a Network 


To form an ANN a number of neurons need to be structured and connected. While 
numerous ways have been proposed in the literature the most common of them all 
is to structure neurons in layers. In its simplest form, known as the multi-layer 
perceptron (MLP), neurons in an ANN are layered across one or more layers but 


not connected to other neurons in the same layer (see Fig. 2.13 for a typical MFP 
structure). The output of each neuron in each layer is connected to all the neurons 
in the next layer. Note that a neuron’s output value feeds merely the neurons of 
the next layer and, thereby, becomes their input. Consequently, the outputs of the 
neurons in the last layer are the outputs of the ANN. The last layer of the ANN is 
also known as the output layer whereas all intermediate layers between the output 
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Fig. 2.13 An MLP example with three inputs, one hidden layer containing four hidden neurons 
and two outputs. The ANN has labeled and ordered neurons and example connection weight labeis. 
Bias weights bj are not illustrated in this example but are connected to each neuron j of the ANN. 


and the input are the hidden layers. It is important to note that the inputs of the 
ANN, X, are connected to all the neurons of the first hidden layer. We illustrate this 
with an additional layer we call the input layer. The input layer does not contain 
neurons as it only distributes the inputs to the first layer of neurons. In summary, 
MLPs are 1) layered because they are grouped in layers; 2) feed-forward because 
their connections are unidirectional and always forward (from a previous layer to 
the next); and 3)fidly connected because every neuron is connected to all neurons 
of the next layer. 


2.5.1.3 Forward Operation 


In the previous section we defined the core components of an ANN whereas in this 
section we will see how we compute the output of the ANN when an input pattern 
is presented. The process is called forward operation and propagates the inputs of 
the ANN throughout its consecutive layers to yield the outputs. The basic steps of 
the forward operation are as follows; 


1. Label and order neurons. We typically start numbering at the input layer and 
increment the numbers towards the output layer (see Fig. 2.13|l. Note that 
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the input layer does not contain neurons, nevertheless is treated as such for 
numbering purposes only. 

2. Label connection weights assuming that Wij is the connection weight from 
neuron i (pre-synaptic neuron) to neuron j (post-synaptic neuron). Label 
bias weights that connect to neuron j as bj. 

3. Present an input pattern X. 

4. For each neuron j compute its output as follows: aj = g{T,i{^ij(^i} + ^;)> 
where aj and a, are, respectively, the output of and the inputs to neuron 
j (n.b. a, = X, in the input layer); g is the activation function (usually the 
logistic sigmoid function). 

5. The outputs of the neurons of the output layer are the outputs of the ANN. 


2.5.1.4 How Does an ANN Learn? 

How do we approximate /(x; w, b) so that the outputs of the ANN match the desired 
outputs (labeis) of our dataset, y? We will need a training algorithm that adjusts the 
weights (w and b) so that f : x ^ y. A training algorithm as such requires two 
components. First, it requires a cost function to evaluate the quality of any set of 
weights. Second, it requires a search strategy within the space of possible Solutions 
(i.e., the weight space). We outline these aspects in the following two subsections. 


Cost (Error) Function 


Before we attempt to adjust the weights to approximate /, we need some measure of 
MLP performance. The most common performance measure for training ANNs in 
a supervised manner is the squared Euclidean distance (error) between the vectors 
of the actual output of the ANN (a) and the desired labeled output y (see equation 

12:21. 


E = ( 2 . 2 ) 

^ ./■ 

where the sum is taken over all the output neurons (the neurons in the final layer). 
Note that the yj labeis are constant values and more importantly, also note that E is 
a function of all the weights of the ANN since the actual outputs depend on them. 
As we will see below, ANN training algorithms build strongly upon this relationship 
between error and weights. 




2 . 5 . Supervised Learning 


63 


Backpropagation 

The backpropagation (or backprop) 157911 algorithm is based on gradient descent 
optimization and is arguably the most common algorithm for training ANNs. Back¬ 
propagation stands for backward propagation of errors as it calculates weight up- 
dates that minimize the error function—that we defined earlier p.2| i —from the out- 
put to the input layer. In a nutshell, backpropagation computes the partial derivative 
(gradient) of the error function E with respect to each weight of the ANN and ad- 
justs the weights of the ANN following the (opposite direction of the) gradient that 
minimizes E. 

As mentioned earlier, the squared Euclidean error of \2.2) depends on the weights 
as the ANN output which is essentially the /(x; w,b) function. As such we can cal¬ 
culate the gradient of E with respect to any weight (^fr) and any hias weight (§§:) 
in the ANN, which in tum will determine the degree to which the error will change if 
we change the weight values. We can then determine how much of such change we 
desire through a parameter rj G [0,1] called learning rate. In the absence of any In¬ 
formation about the general shape of the function between the error and the weights 
but the existence of information about its gradient it appears that a gradient descent 
approach would seem to be a good fit for attempting to find the global minimum of 
the E function. Given the lack of information about the E function, the search can 
start from some random point in the weight space (i.e., random initial weight values) 
and follow the gradient towards lower E values. This process is repeated iteratively 
until we reach E values we are happy with or we run out of computational resources. 

More formally, the basic steps of the backpropagation algorithm are as follows: 


1. Initialize w and b to random (commonly small) values. 

2. For each training pattern (input-output pair): 

(a) Present input pattern x, ideally normalized to a range (e.g., [0,1]). 

(b) Compute ANN actual outputs aj using the forward operation. 

(c) Compute E according to \2.2\ . 

(d) Compute error derivatives with respect to each weight and hias 
weight ^ of the ANN from the output all the way to the input layer. 

(e) Update weights and hias weights as Awij = —rj and Abj — 1] 
respectively. 

3. If £ is small or you are out of computational budget, stop! Otherwise go to 
step 2. 


Note that we do not wish to detail the derivate calculations of step 2(d) as doing 
so would be out of scope for this book. We instead refer the interested reader to the 
original backpropagation paper II579I for the exact formulas and to the reading list 
at the end of this section. 
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Limitations and Solutions 

It is worth noting that backpropagation is not guaranteed to find the global minimum 
of E given its local search (hill-climbing) property. Further, given its gradient-based 
(local) search nature, the algorithm fails to overcome potential plateaux areas in 
the error function landscape. As these are areas with near-zero gradient, Crossing 
them results in near-zero weight updates and further in premature convergence of 
the algorithm. Typical Solutions and enhancements of the algorithm to overcome 
convergence to local minima include; 

• Random restarts; One can rerun the algorithm with new random connection 
weight values in the hope that the ANN is not too dependent on luck. No ANN 
model is good if it depends too much on luck—for instance, if it performs well 
only in one or two out of ten runs. 

• Dynamic leaming rate: One can either modify the learning rate parameter and 
observe changes in the performance of the ANN or introduce a dynamic learn¬ 
ing rate parameter that increases when convergence is slow whereas it decreases 
when convergence to lower E values is fast. 

• Momentum: Alternatively, one may add a momentum amount to the weight up- 
date rule as follows: 

(2.3) 

where m G [0,1] is the momentum parameter and t is the iteration step of the 
weight update. The addition of a momentum value of the previous weight up- 
date (aAwfj attempts to help backpropagation to overcome a potential local 
minimum. 

While the above Solutions are directly applicable to ANNs of small size, practical 
wisdom and empirical evidence with modern (deep) ANN architectures, however, 
suggests that the above drawbacks are largely eliminated II366I . 


Batch vs. Non-batch Training 

Backpropagation can be employed following a batch or a non-batch learning mode. 
In non-batch mode, weights are updated every time a training sample is presented 
to the ANN. In batch mode, weights are updated after all training samples are pre¬ 
sented to the ANN. In that case, errors are accumulated over the samples of the 
batch prior to the weight update. The non-batch mode is more unstable as it itera- 
tively relies on a single data point; however, this might be beneficial for avoiding a 
convergence to a local minimum. The batch mode, on the other hand, is naturally 
a more stable gradient descent approach as weight updates are driven by the aver- 
age error of all training samples in the batch. To best utilize the advantages of both 
approaches it is common to apply batch leaming of randomly selected samples in 
small batch sizes. 
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2.5.1.5 TypesofANNs 

Beyond the Standard feedforward MLP there are numerous other types of ANN used 
for classification, regression, preference learning, data processing and filtering, and 
clustering tasks. Notably, recurrent neural networks (such as Hopfield networks 
II278I . Boltzmann machines ||4| and Long Short-Term Memory 12661 1 allow con- 
nections between neurons to form directed cycles, thus enabling an ANN to capture 
dynamic and temporal phenomena (e.g., time-series processing and prediction). Fur- 
ther, there are ANN types mostly used for clustering and data dimensionality reduc- 
tion such as Kohonen self-organizing maps 04711 and Autoencoders HTI . 


2.5.1.6 From Shallow to Deep 

A critical parameter for ANN training is the size of the ANN. So, how wide and 
deep should my ANN architecture be to perform well on this particular task? While 
there is no formal and definite answer to this question, there is a generally accepted 
rule-of-thumb suggesting that the size of the network should match the complexity 
of the problem. According to Goodfellow et al. in their deep learning book M231I 
an MLP is essentially a deep (feedforward) neural network. Its depth is determined 
by the number of hidden layers it contains. Goodfellow et al. state that “It is from 
this terminology that the name deep learning arises”. On that basis, training of 
ANN architectures containing (at least) a hidden layer can be viewed as a deep 
learning task whereas single output-layered architectures can be viewed as shallow. 
Various methods have been introduced in recent years to enable training of deep 
architectures containing several layers. The methods largely rely on gradient search 
and are covered in detail in M231I for the interested reader. 


2.5.1.7 ANNs for Ms Pac-Man 

As with every other method in this chapter we will attempt to employ ANNs in 
the Ms Pac-Man game. One straightforward way to use ANNs in Ms Pac-Man is 
to attempt to imitate expert players of the game. Thus, one can ask experts to play 
the game and record their playthroughs, through which a number of features can 
be extracted and used as the input of the ANN. The resolution of the ANN input 
may vary from simple statistics of the game—such as the average distance between 
ghosts and Ms Pac-Man—to detailed pixel-to-pixel RGB values of the game level 
image. The output data, on the other hand, may contain the actions selected by 
Ms Pac-Man in each frame of the game. Given the input and desired output pairs, 
the ANN is trained via backpropagation to predict the action performed by expert 
players (ANN output) given the cutTent game state (ANN input). The size (width 
and depth) of the ANN depends on both the amount of data available from the 
expert Ms Pac-Man players and the size of the input vector considered. 
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2.5.2 Support Vector Machines 

Support vector machines (SVMs) 111391 are an alternative and very popular set of 
supervised learning algorithms that can be used for classification, regression IIT791 
and preference learning 0021 tasks. A support vector machine is a binary linear 
classifier that is trained so as to maximize the margin between the training examples 
of the separate classes in the data (e.g., apples and pears). As with every other super¬ 
vised learning algorithm, the attributes of new and unseen examples are seeding the 
SVM which predicts the class they belong to. SVMs have been used widely for text 
categorization, speech recognition, image classihcation, and hand-written character 
recognition among many other areas. 

Similarly to ANNs, SVMs construet a hyperplane that divides the input space 
and represents the function / that maps between the input and the target outputs. In- 
stead of implicitly attempting to minimize the difference between the modehs actual 
output and the target output following the gradient of the error (as backpropagation 
does), SVMs construet a hyperplane that maintains the largest distance to the nearest 
training-data point of any other class. That distance is called a maximum-margin 
and its corresponding hyperplane divides the points (xi) of class with label (y,) 1 
from those with label —1 in a dataset of n samples in total. In other words, the dis¬ 
tance between the derived hyperplane and the nearest point x, from either class is 
maximized. Given the input attributes of a training dataset, x, the general form of a 
hyperplane can be dehned as: w • x — h = 0 where, as in backpropagation training, 
w is the weight (normal) vector of the hyperplane and determines the offset 
(or weight threshold/bias) of the hyperplane from the origin (see Fig. |2.14| l. Thus, 
formally put, an SVM is a function /(x;w,h) that predicts target outputs (y) and 
attempts to 


minimize ||w||, (2.4) 

subject to y,(w-x/ — h) > l,for i = 1, ..., n (2.5) 


The weights w and b determine the SVM classiher. The x,- vectors that lie nearest 
to the derived hyperplane are called support vectors. The above problem is solvable 
if the training data is linearly separable (also known as a hard-margin classification 
task; see Fig. 2.14[ ). If the data is not linearly separable (soft-margin) the SVM 
instead attempts to 


minimize 


1 " 

- Y, max (0,1 - y,- ( w • X,- - h)) 
n 


-l-A||w||^ 


(2.6) 


which equals A||w|f if the hard constraints of equation 2.5 are satished—i.e., if 


all data points are correctly classihed on the right side of the margin. The value of 
equation ( |2.6| ) is proportional to the distance from the margin for misclassified data 
and A is designed so as to qualitatively determine the degree to which the margin- 
size should be increased versus ensuring that the x, will lie on the correct side of the 
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Fig. 2.14 An example of a maximum-margin hyperplane (red thick line) and margins (black lines) 
for an SVM which is trained on data samples from two classes. Solid and empty circles correspond 
to data with labeis 1 and — 1, respectively. The classification is mapped onto a two-dimensional 
input vector {xi,X 2 ) in this example. The two data samples on the margin—the circles depicted 
with red outline—are the support vectors. 


margin. Evidently, if we choose a small value for X we approximate the hard-margin 
classifier for linearly separable data. 

The Standard approach for training soft-margin classifiers is to treat the learning 
task as a quadratic programming problem and search the space of w and b to find 
the widest possible margin that matches all data points. Other approaches include 
sub-gradient descent and coordinate descent. 

In addition to linear classification tasks, SVMs can support non-linear classifi¬ 
cation by employing a number of different non-linear kernels which map the in¬ 
put space onto higher-dimensional feature spaces. The SVM task remains similar, 
except that every dot product is replaced by a nonlinear kernel function. This al- 
lows the algorithm to fit the maximum-margin hyperplane in a transformed feature 
space. Popular kernels used in conjunction with SVMs include polynomial func- 
tions, Gaussian radial basis functions or hyperbolic tangent functions. 
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While SVMs were originally designed to tackle binary classification problems 
there exist several SVM variants that can tackle multi-class classification II284I . re- 
gression 11791 and preference learning 030211 that the interested reader can refer to. 

SVMs have a number of advantages compared to other supervised learning ap- 
proaches. They are efficient in finding Solutions when dealing with large, yet sparse, 
datasets as they only depend on support vectors to construet hyperplanes. They also 
handle well large feature spaces as the learning task complexity does not depend on 
the dimensionality of the feature space. SVMs feature a simple convex optimization 
problem which can be guaranteed to converge to a single global solution. Finally, 
overfitting can be controlled easily through the soft margin classification approach. 


2.5.2.1 SVMs for Ms Pac-Man 

Similarly to ANNs, SVMs can be used for imitating the behavior of Ms Pac-Man 
expert players. The considerations about the feature (input) space and the action 
(output) space remain the same. In addition to the design of the input and output 
vectors, the size and quality of the data obtained from expert players will determine 
the performance of the SVM controlling Ms Pac-Man towards maximizing its score. 


2.5.3 Decision Tree Learning 


In decision tree learning ll67ll . the function / we attempt to derive uses a decision 
tree representation which maps attributes of data observations to their target values. 
The former (inputs) are represented as the nodes and the latter (outputs) are repre- 
sented as the leaves of the tree. The possible values of each node (input) are repre¬ 
sented by the various branches of that node. As with the other supervised learning 
algorithms, decision trees can be classified depending on the output data type they 
attempt to learn. In particular, decision trees can be distinguished into classification, 
regression and rank trees if, respectively, the target output is a finite set of values, a 
set of continuous (interval) values, or a set of ordinal relations among observations. 

An example of a decision tree is illustrated in Fig. |2.15 Tree nodes correspond 
to input attributes; there are branches to children for each of the possible values of 
each input attribute. Further leaves represent values of the output—car type in this 
example—given the values of the input attributes as determined by the path from 
the root to the leaf. 

The goal of decision tree learning is to construet a mapping (a tree model) that 
predicts the value of target outputs based on a number of input attributes. The basic 
and most common approach for learning decision trees from data follows a top- 
down recursive tree induction strategy which has the characteristies of a greedy 
process. The algorithm assumes that both the input attributes and the target outputs 
have finite discrete domains and are of categorical nature. If inputs or outputs are 
continuous values, they can be discretized prior to constructing the tree. A tree is 
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Fig. 2.15 A decision tree example: Given age, employment status and salary (data attributes) the 
tree predicts the type of car (target value) a person owns. Tree nodes (blue rounded rectangles) 
represent data attributes, or inputs, whereas leaves (gray ovals) represent target values, or outputs. 
Tree branches represent possible values of the corresponding parent node of the tree. 


gradually constructed by splitting the available training dataset into subsets based 
on selections made for the attributes of the dataset. This process is repeated on a 
attribute-per-attribute basis in a recursive manner. 

There are several variants of the above process that lead to dissimilar decision- 
tree algorithms. The two most notable variants of decision tree learning, however, 
are the Iterative Dichotomiser 3 (ID3) 154411 and its successor C4.5 M545I . The 
basic tree learning algorithm has the following general steps: 


1. At start, all the training examples are at the root of the tree. 

2. Select an attribute on the basis of a heuristic and pick the attribute with the 
maximum heuristic value. The two most popular heuristics are as follows: 

• Information gain: This heuristic is used by both the ID3 and the C4.5 
tree-generation algorithms. Information gain G{A) is based on the con- 
cept of entropy from information theory and measures the difference in 
entropy H from before to after the dataset D is split on an attribute A. 

G{A)=H{D)-Ha{D) (2.7) 

where H{D) is the entropy of D (H{D) = — Y,TPi^og2{pi)y, Pi is the 
probability that an arbitrary sample in D belongs to class i; m is the total 
number of classes; Ha{D) is the information needed (after using attribute 
A to split D into v partitions) to classify D and is calculated as Ha {D) = 
— j \/\D\)H(Dj) with |x| being the size of x. 
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• Gain ratio: The C4.5 algorithm uses the gain ratio heuristic to reduce the 
bias of information gain towards attributes with a large number of values. 
The gain ratio normalizes information gain by taking into account the 
number and size of branches when choosing an attribute. The information 
gain ratio is the ratio between the information gain and the intrinsic value 
IVA of attribute A: 


GR{A)=G{A)/IVa{D) 


( 2 . 8 ) 


where 



(2.9) 


3. Based on the selected attribute from step 2, construet a new node of the 
tree and split the dataset into subsets according to the possible values of the 
selected attribute. The possible values of the attribute become the branches 
of the node. 

4. Repeat steps 2 and 3 until one of the following occurs: 

• All s amples for a given node belong to the same class. 

• There are no remaining attributes for further partitioning. 

• There are no data samples left. 


2.5.3.1 Decision Trees for Ms Pac-Man 

As with ANNs and SVMs, decision tree learning requires data to be trained on. 
Presuming that data from expert Ms Pac-Man players would be of good quality 
and quantity, decision trees can be constructed to predict the strategy of Ms Pac- 
Man based on a number of ad-hoc designed attributes of the game state. Figure 
2.16 illustrates a simplified hypothetical decision tree for controlling Ms Pac-Man. 
According to that example if a ghost is nearby then Ms Pac-Man checks if power 
pilis are available in a close distance and aims for those; otherwise it takes actions so 
that it evades the ghost. If alternatively, ghosts are not visible Ms Pac-Man checks 
for pellets. If those are nearby or in a fair distance then it aims for them; otherwise 
it aims for the fruit, if that is available on the level. It is important to note that the 
leaves of the tree in our example represent control strategies (macro-actions) rather 
than actual actions (up, down, left, right) for Ms Pac-Man. 
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Fig. 2.16 A decision tree example for controlling Ms Pac-Man. The tree is trained on data from 
expert Ms Pac-Man players. Given the distance from the nearest ghost, power pili and pellet (data 
attributes) the tree predicts the strategy Ms Pac-Man needs to follow. 


2.5.4 Further Reading 

The core supervised learning algorithms are covered in detail in the Russell and 
Norvig classic AI textbook 15821 including decision tree learning (Chapter 18) and 
artificial neural networks (Chapter 19). Detailed descriptions of artificial neural net- 
works and backpropagation can also be found in the book of Haykin 12531 . Deep 
architectures of ANNs are covered in great detail in the deep learning book by Good- 
fellow et al. 02311 . Finally, support vector machines are covered in the tutorial paper 
ofBurges if^ . 

The preference learning version of backpropagation in shallow and deep archi¬ 
tectures can be found in 0430114360 whereas RankSVM is covered in the original 
paper of Joachims 03031 . 


2.6 Reinforcement Learning 

Reinforcement Learning (RL) 06721 is a machine learning approach inspired by 
behaviorist psychology and, in particular, the way humans and animals leam to take 
decisions via (positive or negative) rewards received by their environment. In rein¬ 
forcement learning, samples of good behavior are usually not available (as in su¬ 
pervised learning); instead, similarly to evolutionary (reinforcement) learning, the 
training signal of the algorithm is provided by the environment based on how an 
agent is interacting with it. At a particular point in time f, the agent is on a particular 
state s and decides to take an action a from all the available actions in its current 
state. As a response the environment delivers an immediate reward, r. Through 
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Fig. 2.17 A reinforcement learning example. The agent (triangle) attempts to reach the goal (G) 
by taking an action (a) among all available actions in its current state (i). The agent receives an 
immediate reward (r) and the environment notities the agent about its new state after taking the 
action. 


the continuous interaction between the agent and its environment, the agent gradu- 
ally learns to select actions that maximize its sum of rewards. RL has been studied 
from a variety of disciplinary perspectives including operations research, game the- 
ory, information theory, and genetic algorithms and has been successfully applied in 
problems which involve a balance between long-term and short-term rewards such 
as robot control and games 1464116291 . An example of the reinforcement problem is 


illustrated through a maze navigation task in Fig. 2.17 


More formally, the aim of the agent is to discover a policy in) for selecting 
actions that maximize a measure of a long-term reward such as the expected cumu- 
lative reward. A policy is a strategy that the agent follows in selecting actions, given 
the state it is in. If the function that characterizes the value of each action either 
exists or is leamed, the optimal policy {n*) can be derived by selecting the action 
with the highest value. The interactions with the environment occur in discrete time 
steps (f = {0, 1,2,...}) and are modeled as a Markov decision process (MDP). The 
MDP is defined by 


• S: A set of States G S. The environment States are a function of the 

agent’s information about the environment (i.e., the agent’s inputs). 

• A: A set of actions {ai,...,a,„} G A possible in each state s. The actions represent 
the different ways the agent can act in the environment. 

• P{s,s',a): The probability of transition from s to s' given a. P gives the prob- 
ability of ending in state s' after picking action a in state s and it follows the 
Markov property implying that future States of the process depend only upon 
the present state, not on the sequence of events that preceded it. As a resuit, the 
Markov property of P makes predictions of 1-step dynamics possible. 

• R{s,s',a): The reward function on transition from s to s' given a. When the agent 
in state s picks an action a and moves to state s', it receives an immediate reward 
r from the environment. 
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P and R define the world model and represent, respectively, the environment’s 
dynamics {P) and the long-term reward {R) for each policy. If the world model is 
known there is no need to learn to estimate the transition probability and reward 
function and we thus directly calculate the optimal strategy (policy) using model- 
based approaches such as dynamic programming ll44ll . If, instead, the world model 
is unknown we approximate the transition and the reward functions by learning es- 
timates of future rewards given by picking action a in state s. We then calculate 
our policy based on these estimates. Learning occurs via model-free methods such 
as Monte Carlo search and temporal difference learning II672L In this section we 
put an emphasis on the latter set of algorithms and in particular, we focus on the 
most popular algorithm of TD learning; Q-learning. Before delving into the details 
of the Q-learning algorithm, we first discuss a few core RL concepts and provide a 
high-level taxonomy of RL algorithms according to RL problems and tools used for 
tackling them. We will use this taxonomy to place Q-learning with respect to RL as 
a whole. 


2.6.1 Core Concepts and a High-Level Taxonomy 

A Central question in RL problems is the right balance between the exploitation of 
current learned knowledge versus the exploration of new unseen territories in the 
search space. Both randomly selecting actions (no exploitation) and always greed- 
ily selecting the best action according to a measure of performance or reward (no 
exploration) are strategies that generally yield poor results in stochastic environ- 
ments. While several approaches have been proposed in the literature to address 
the exploration-exploitation balance issue, a popular and rather efficient mechanism 
for RL action selection is called e-greedy, determined by the e G [0,1] parameter. 
According to e-greedy the RL agent chooses the action it believes will return the 
highest future reward with probability 1 — e; otherwise, it chooses an action uni- 
formly at random. 

RL problems can be classified into episodic versus incremental. In the former 
class, algorithm training occurs offline and within a finite horizon of multiple train- 
ing instances. The finite sequence of States, actions and reward signals received 
within that horizon is called an episode. Monte Carlo methods that rely on repeated 
random sampling, for instance, are a typical example of episodic RL. In the lat¬ 
ter class of algorithms, instead, learning occurs Online and it is not bounded by an 
horizon. We meet TD learning under incremental RL algorithms. 

Another distinction is between off-policy and on-policy RL algorithms. An off- 
policy learner approximates the optimal policy independently of the agent’s actions. 
As we will see below, Q-leaming is an off-policy learner since it estimates the return 
for state-action pairs assuming that a greedy policy is followed. An on-policy RL 
algorithm instead approximates the policy as a process being tied to the agent’s 
actions including the exploration steps. 
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Bootstrapping is a Central notion within RL that classifies algorithms based on 
the way they optimize state values. Bootstrapping estimates how good a state is 
based on how good we think the next state is. In other words, with bootstrapping 
we update an estimate based on another estimate. Both TD learning and dynamic 
programming use bootstrapping to learn from the experience of visiting States and 
updating their values. Monte Carlo search methods instead do not use bootstrapping 
and thus learn each state value separately. 

Finally, the notion of backup is Central in RL and acts as a distinctive feature 
among RL algorithms. With backup we go backwards from a state in the future, 
to the (current) state we want to evaluate, s,, and consider the in-between state 
values in our estimates. The backup operation has two main properties: its depth— 
which varies from one step backwards to a full backup—and its breadth —which 
varies from a (randomly) selected number of sample States within each time step to 
a full-breadth backup. 

Based on the above criteria we can identify three major RL algorithm types: 

1. Dynamic programming. In dynamic programming knowledge of the world 
model (P and R) is required and the optimal policy is calculated via bootstrap¬ 
ping. 

2. Monte Carlo methods. Knowledge of the world model is not required for 
Monte Carlo methods. Algorithms of this class (e.g., MCTS) are ideal for off- 
line (episodic) training and they learn via sample-breadth and full-depth backup. 
Monte Carlo methods do not use bootstrapping, however. 

3. TD learning. As with Monte Carlo methods knowledge of the world model is 
not required and it is thus estimated. Algorithms of this type (e.g., Q-learning) 
leam from experience via bootstrapping and variants of backup. 

In the following section we cover the most popular TD learning algorithm in the 
RL literature with the widest use in game AI research. 


2.6.2 Q-Learning 

Q-learning M748II is a model-free, off-policy, TD learning algorithm that relies on 
a tabular representation of Q{s,a) values (hence its name). Informally, Q{s,a) rep- 
resents how good it is to pick action a in state s. Formally, Q{s,a) is the expected 
discounted reinforcement of taking action a in state s. The Q-learning agent leams 
from experience by picking actions and receiving rewards via bootstrapping. 

The goal of the Q-learning agent is to maximize its expected reward by pick¬ 
ing the right action at each state. The reward, in particular, is a weighted sum of 
the expected values of the discounted future rewards. The Q-learning algorithm is 
a simple update on the Q values in an iterative fashion. Initially, the Q table has 
arbitrary values as set by the designer. Then each time the agent selects an action 
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a from state s, it visits state s', it receives an immediate reward r, and updates its 
Q{s,a) value as follows: 

Q{s,a) -ir- Q{s,a) + a{r + ymzxQ{s',a') — Q{s,a)} (2.10) 

a' 

where a S [0,1] is the learning rate and 7 s [0,1] is the discount factor. The 
learning rate determines the extent to which the new estimate for Q will ovenide 
the old estimate. The discount factor weights the importance of earlier versus later 
rewards; the closer yis to 1 , the greater the weight is given to future reinforcements. 
As seen from equation ( |2.10| i, the algorithm uses bootstrapping since it maintains 
estimates of how good a state-action pair is (i.e., Q{s,a)) based on how good it thinks 
the next state is (i.e., Q{s',a')). It also uses a one-step-depth, full-breadth backup to 
estimate Q by taking into consideration all Q values of all possible actions a' of 
the newly visited state s'. It is proven that by using the learning rule of equation 
p.l0| l the Q{s,a) values converge to the expected future discounted reward M748I . 
The optimal policy can then be calculated based on the Q-values; the agent in state 
s selects the action a with the highest Q{s,a) value. In summary, the basic steps of 
the algorithm are as follows: 


Given an immediate reward function r and a table of Q{s, a) values for all pos¬ 
sible actions in each state: 

1. Initialize the table with arbitrary Q values; e.g., Q{s,a) = 0. 

2. s -(r- Start state. 

3. While not finished* do: 

(a) Choose an action a based on policy derived from Q (e.g., e-greedy). 

(b) Apply the action, transit to state s', and receive an immediate reward 
r. 

(c) Update the value of Q{s,a) as per ( |2.10| l. 

(d) i s'. 

*The most commonly used termination conditions are the algorithm’s speed — 
i.e., stop within a number of iterations—or the quality of convergence—i.e., 
stop if you are satisfied with the obtained policy. 


2.6.2.1 Limitations of Q-Learning 

Q-leaming has a number of limitations associated primarily with its tabular repre- 
sentation. First of all, depending on the chosen state-action representation the size 
of the state-action space might be computationally very expensive to handle. As 
the Q table size grows our computational needs for memory allocation and infor- 
mation retrieval increase. Further, we may experience very long convergence since 
learning time is exponential to the size of the state-action space. To overcome these 
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obstacles and get decent performance from RL learners we need to devise a way of 
reducing the state-action space. Section 2.8 outlines the approach of using artificial 
neural networks as Q-value function approximators, directly bypassing the Q-table 
limitation and yielding compressed representations for our RL learner. 


2.6.2.2 Q-Learning for Ms Pac-Man 

Q-leaming is applicable for controlling Ms Pac-Man as long as we define a suitable 
state-action space and we design an appropriate reward function. A state in Ms Pac- 
Man could be represented directly as the current snapsnot of the game—i.e., where 
Ms Pac-Man and ghosts are and which pellets and power pilis are stili available. 
That representation, however, yields a prohibitive number of game States for a Q- 
table to be constmcted and processed. Instead, it might be preferred to choose a 
more indirect representation such as whether ghosts and pellets are nearby or not. 
Possible actions for Ms Pac-Man could be that it either keeps its current direction, 
it tums backward, it turns left, or it tums right. Finally, the reward function can be 
designed to reward Ms Pac-Man positively when it eats a pellet, a ghost or a power 
pili, whereas it could penalize Ms Pac-Man when it dies. 

It is important to note that both Pac-Man and Ms Pac-Man follow the Markov 
property in the sense that any future game States may depend only upon the present 
game state. There is one core difference however; while the transition probability in 
Pac-Man is known given its deterministic nature, it is largely unknown in Ms Pac- 
Man given the stochastic behavior of the ghosts in that game. Thereby, Pac-Man can 
theoretically be solved via model-based approaches (e.g., dynamic programming) 
whereas the world model of Ms Pac-Man can only be approximated via model-free 
methods such as temporal difference learning. 


2.6.3 Further Reading 

The RL book of Sutton and Barto 067211 is highly recommended for a thorough 
presentation of RL including Q-learning (Chapter 6). The book is freely available 
onlinej^A draft version of the latest (2017) version of the book is also availablej^ 
The survey paper of Kaelbling et al. 03161 is another recommended reading of the 
approaches covered. Finally, for an in-depth analysis of model-based RL approaches 
you are referred to the dynamic programming book of Bertsekas iH. 


^ http://incompIeteideas.net/sutton/book/ebook/the-book.html 
^ http://incompleteideas.net/sutton/book/the-book-2nd.html 
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2.7 Unsupervised Learning 

As stated earlier, the utility type (or training signal) determines the class of the AI 
algorithm. In supervised learning the training signal is provided as data labeis (target 
outputs) and in reinforcement learning it is derived as a reward from the environ- 
ment. Unsupervised learning instead attempts to discover associations of the input 
by searching for pattems among all input data attributes and without having access 
to a target output—a machine learning process that is usually inspired by Hebbian 
learning 112561 and the principies of self-organization 1201 . With unsupervised learn¬ 
ing we focus on the intrinsic structure of and associations in the data instead of 
attempting to imitate or predict target values. We cover two unsupervised learning 
tasks with corresponding algorithms; clustering and frequent pattern mining. 


2.7.1 Clustering 

Clustering is the unsupervised learning task of finding unknown groups of a num- 
ber of data points so that data within a group (or else, cluster) is similar to each 
other and dissimilar to data from other clusters. Clustering has found applications in 
detecting groups of data across multiple attributes and in data reduction tasks such 
as data compression, noise smoothing, outlier detection and dataset partition. Clus¬ 
tering is of key importance for games with applications in player modeling, game 
playing and content generation. 

As with classification, clustering places data into classes; the labeis of the classes, 
however, are unknown a priori and clustering algorithms aim to discover them by 
assessing their quality iteratively. Since the correct clusters are unknown, similar- 
ity (and dissimilarity) depends only on the data attributes used. Good clusters are 
characterized by two core properties: 1) high infra-cluster similarity, or else, high 
compactness and 2) low infer-cluster similarity, or else, good separation. A popular 
measure of compactness is the average distance between every sample in the cluster 
and the closest representative point—e.g., centroid—as used in the k-means algo¬ 
rithm. Examples of separation measures include the single Unk and the complete 
link: the former is the smallest distance between any sample in one cluster and any 
sample in the other cluster; the latter is the largest distance between any sample in 
one cluster and any sample in the other cluster. While compactness and separation 
are objective measures of cluster validity, it is important to note that they are not 
indicators of cluster meaningfulness. 

Beyond the validity metrics described above, clustering algorithms are defined by 
a membership function and a search procedure. The membership function defines 
the structure of the clusters in relation to the data samples. The search procedure is 
a strategy we follow to cluster our data given a membership function and a validity 
metric. Examples of such strategies include splitting all data points into clusters at 
once (as in k-means), or recursively merging (or splitting) clusters (as in hierarchical 
clustering). 
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Clustering can be realized via a plethora of algorithms including hierarchical 
clustering, k-means mi], k-medoids 13291, DBSCAN dSl and self-organizing 
maps M347I . The algorithms are dissimilar in the way they define what a cluster 
is and how they form it. Selecting an appropriate clustering algorithm and its cor- 
responding parameters, such as which distance function to use or the number of 
clusters to expect, depends on the aims of the study and the data available. In the 
remainder of the section we outline the clustering algorithms we find to be the most 
useful for the study of AI in games. 


2.7.1.1 K-Means Clustering 

K-means 04111 is a vector quantization method that is considered the most popular 
clustering algorithm as it offers a good balance between simplicity and effective- 
ness. It follows a simple data partitioning approach according to which it partitions 
a database of objects into a set of k clusters, such that the sum of squared Euclidean 
distances between data points and their corresponding cluster center (centroid) is 
minimized—this distance is also known as the quantization error. 

In k-means each cluster is defined by one point, that is the centroid of the clus¬ 
ter, and each data sample is assigned to the closest centroid. The centroid is the 
mean of the data s amples in the cluster. The intra-cluster validity metric used by 
k-means is the average distance to the centroid. Initially, the data samples are ran- 
domly assigned to a cluster and then the algorithm proceeds by alternating between 
the re-assignment of data into clusters and the update of the resulting centroids. The 
basic steps of the algorithm are as follows: 


Given k 

1. Randomly partition the data points into k nonempty clusters. 

2. Compute the position of the centroids of the clusters of the current partition¬ 
ing. Centroids are the centers (mean points) of the clusters. 

3. Assign each data point to the cluster with the nearest centroid. 

4. Stop when the assignment does not change; otherwise go to step 2. 


While k-means is very popular due to its simplicity it has a number of con- 
siderable weaknesses. First, it is applicable only to data objects in a continuous 
space. Second, one needs to specify the number of clusters, k, in advance. Third, 
it is not suitable to discover clusters with non-convex shapes as it can only find 
hyper-spherical clusters. Finally, k-means is sensitive to outliers as data points with 
extremely large (or small) values may substantially distort the distribution of the 
data and affect the performance of the algorithm. As we will see below, hierarchical 
clustering manages to overcome some of the above drawbacks, suggesting a useful 
alternative approach to data clustering. 
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2.7.1.2 Hierarchical Clustering 

Clustering methods that attempt to build a hierarchy of clusters fall under the hi¬ 
erarchical clustering approach. Generally speaking there are two main strategies 
available: the agglomerative and the divisive. The former constructs hierarchies in 
a bottom-up fashion by gradually merging data points together, whereas the lat- 
ter constructs hierarchies of clusters by gradually splitting the dataset in a top-down 
fashion. Both clustering strategies are greedy. Hierarchical clustering uses a distance 
matrix as the clustering strategy (whether agglomerative or divisive). This method 
does not require the number of clusters k as an input, but needs a termination con¬ 
di tion. 

Indicatively, we present the basic steps of the agglomerative clustering algorithm 
which are as follows: 


Given k 

1. Create one cluster per data sample. 

2. Find the two closest data samples—i.e., find the shortest Euclidean distance 
between two points (single link)—which are not in the same cluster. 

3. Merge the clusters containing these two samples. 

4. Stop if there are k clusters; otherwise go to step 2. 


In divisive hierarchical clustering instead, all data are initially in the same cluster 
which is split until every data point is on its own cluster following a split strategy— 
e.g., Divisive ANAlysis Clustering (DIANA) 03 3011 —or employing another cluster¬ 
ing algorithm to split the data in two clusters—e.g., 2-means. 

Once clusters of data are iteratively merged (or split), one can visualize the clus¬ 
ters by decomposing the data into several levels of nested partitioning. In other 
words, one can observe a tree representation of clusters which is also known as a 
dendrogram. The clustering of data is obtained by cutting the dendrogram at the 
desired level of squared Euclidean distance. Eor the interested reader, a dendrogram 
example is illustrated in Chapter]^ 

Hierarchical clustering represents clusters as the set of data samples contained in 
them and, as a resuit, a data sample belongs to the same cluster as its closest sample. 
In k-means instead, each cluster is represented by a centroid and thus a data sample 
belongs to the cluster represented by the closest centroid. Eurther, when it comes to 
cluster validity metrics, agglomerative clustering uses the shortest distance between 
any sample in one cluster and a sample in another whereas k-means uses the average 
distance to the centroid. Due to these different algorithmic properties hierarchical 
clustering has the capacity to cluster data that come in any form of a connected 
shape; k-means, on the other hand, is only limited to hyper-spherical clusters. 
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2.7.1.3 Clustering for Ms Pac-Man 

One potential application of clustering for controlling Ms Pac-Man would be to 
model ghost behaviors and use that information as an input to the controller of 
Ms Pac-Man. Whether it is k-means or hierarchical clustering, the algorithm would 
consider different attributes of ghost behavior—such as level exploration, behavior 
divergence, distance between ghosts, etc.—and cluster the ghosts into behavioral 
patterns or profiles. The controller of Ms Pac-Man would then consider the ghost 
profile met in a particular level as an additional input for guiding the agent better. 

Arguably, beyond agent control, we can think of better uses of clustering for this 
game such as profiling Ms Pac-Man players and generating appropriate levels or 
challenges for them so that the game is balanced. As mentioned earlier, however, 
the focus of the Ms Pac-Man examples is on the control of the playing agent for the 
purpose of maintaining a consistent paradigm throughout this chapter. 


2 , 7,2 Frequent Pattern Mining 

Frequent pattern mining is a set of techniques that attempt to derive frequent 
patterns and structures in data. Patterns include sequences and itemsets. Frequent 
pattern mining was first proposed for mining association rules 0, which aims to 
identify a number of data attributes that frequently associate to each other, thereby 
forming conditional rules among them. There are two types of frequent pattern min¬ 
ing that are of particular interest for game AI; frequent itemset mining and fre¬ 
quent sequence mining. The former aims to find structure among data attributes 
that have no particular internal order whereas the latter aims to find structure among 
data attributes based on an inherent temporal order. While associated with the unsu- 
pervised learning paradigm, frequent pattern mining is dissimilar in both the aims 
and the algorithmic procedures it follows. 

Popular and scalable frequent pattern mining methods include the Apriori al¬ 
gorithm 0 for itemset mining, and SPADE M793I and GSP 065211434116211 for 
sequence mining. In the remainder of this section we outline Apriori and GSP as 
representative algorithms for frequent itemset and frequent sequence mining, re- 
spectively. 


2.7.2.1 Apriori 

Apriori 0 is an algorithm for frequent itemset mining. The algorithm is appropriate 
for mining datasets that contain sets of instances (also named transactions) that each 
feature a set of items, or an itemset. Examples of transactions include books bought 
by an Amazon customer or apps bought by a smartphone user. The algorithm is 
very simple and can be described as follows: given a predetermined threshold named 
support (T), Apriori detects the itemsets which are subsets of at least T transactions 
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in the database. In other words, Apri ori will attempt to identify all itemsets that have 
at least a minimum support which is the minimum number of times an itemset exists 
in the dataset. 

To demonstrate Apriori in a game example, below we indicatively list events 
from four players of an Online role playing game: 

• <Completed more than 10 levels; Most achievements unlocked; Bought the 
shield of the magi> 

• <Completed more than 10 levels; Bought the shield of the magi> 

• <Most achievements unlocked; Bought the shield of the magi; Found the Wiz- 
ard’s purple hat> 

• <Most achievements unlocked; Found the Wizard’s purple hat; Completed more 
than 10 levels; Bought the shield of the magi> 

If in the example dataset above we assume that the support is 3, the following 
1-itemsets (sets of only one item) can be found: <Completed more than 10 levels>, 
<Most achievements unlocked> and <Bought the shield of the magi>. If instead, 
we seek 2-itemsets with a support threshold of 3 we can find <Completed more than 
10 levels, Bought the shield of the magi>, as three of the transactions above contain 
both of these items. Longer itemsets are not available (not frequent) for support 
count 3. The process can be repeated for any support threshold we wish to detect 
frequent itemsets for. 


1 . 1 . 1.1 Generalized Sequential Patterns 

Frequent itemset mining algorithms are not adequate if the sequence of events is 
the critical information we wish to mine from a dataset. The dataset may contain 
events in an ordered set of sequences such as temporal sequence data or time series. 
Instead, we need to opt for a frequent sequence mining approach. The sequence min¬ 
ing problem can be simply described as the process of hnding frequently occurring 
subsequences given a sequence or a set of sequences. 

More formally, given a dataset in which each sample is a sequence of events, 
namely a data sequence, a sequential pattern defined as a subsequence of events is 
a frequent sequence if it occurs in the samples of the dataset regularly. A frequent 
sequence can be defined as a sequential pattern that is supported by, at least, a min¬ 
imum amount of data-sequences. This amount is determined by a threshold named 
minimum support value. A data sequence supports a sequential pattern if and only 
if it contains all the events present in the pattern in the same order. For example, the 
data-sequence < xo,xi,X 2 ,xj,,X 4 ,X 5 > supports the pattern < xq^x^ >. As with fre¬ 
quent itemset mining, the amount of data sequences that support a sequential pattern 
is referred as the support count. 

The Generalized Sequential Patterns (GSP) algorithm 165211 is a popular method 
for mining frequent sequences in data. GSP starts by extracting the frequent se¬ 
quences with a single event, namely 1-sequences. That set of sequences is self- 
joined to generate all 2-sequence candidates for which we calculate their support 
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count. Those sequences that are frequent (i.e., their support count is greater than a 
threshold value) are then self-joined to generate the set of 3-sequence candidales. 
The algorithm is gradually increasing the length of the sequences in each algorith- 
mic step until the next set of candidales is empty. The basic principle of the algo¬ 
rithm is that if a sequential pattem is frequent, then its contiguous subsequences are 
also frequent. 


1 . 1 . 1 . 7 ) Frequent Pattern Mining for Ms Pac-Man 

Patterns of events of sequences can be extracted to assist the control of Ms Pac-Man. 
Itemsets may be identified across successful events of expert Ms Pac-Man play- 
ers given a particular support count. For instance, an Apriori algorithm running on 
events across several different expert players might reveal that a frequent 2-itemset 
is the following: <player went for the upper left corner first, player ate the bottom 
right power pili first>. Such Information can be useful explicitly for designing rules 
for controlling Ms Pac-Man. 

Beyond itemsets, frequencies of ghost events can be considered for playing Ms 
Pac-Man. For example, by running GSP on extracted attributes of ghosts it might 
turn out that when Ms Pac-Man eats a power pili it is very likely that the Blinky 
ghost moves left (<power pili, Blinky left>). Such frequent sequences can form 
additional inputs of any Ms Pac-Man controller—e.g., an ANN. Chapterdetails 
an example on this frequent sequence mining approach in a 3D prey-predator game. 


2.7.3 Further Reading 

A general introduction to frequent pattern mining is offered in 0. The Apriori 
algorithm is detailed in the original article of Agrawal and Srikant Q whereas GSP 
is covered throughly in II652II . 


2.8 Notable Hybrid Algorithms 

AI methods can be interwoven in numerous ways to yield new sophisticated algo¬ 
rithms that aggregate the strengths of their combined parts, often with an occurring 
gestalt effect. You can, for instance, let GAs evolve your behavior trees or FSMs; 
you can instead empower MCTS with ANN estimators for tree pruning; or you can 
add a component of local search in every search algorithm covered earlier. We name 
the resulting combinations of AI methods as hybrid algorithms and in this section 
we cover the two most influential, in our opinion, hybrid game AI algorithms; neu- 
roevolution and temporal difference leaming with ANN function approximators. 
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2.8.1 Neuroevolution 

The evolution of artificial neural networks, or else neuroevolution, refers to the 
design of artificial neural networks—their connection weights, their topology, or 
both—using evolutionary algorithms 07861 . Neuroevolution has been successfully 
applied in the domains of artificial life, robot control, generative systems and com¬ 
puter games. The algorithm’s wide applicability is primarily due to two main rea- 
sons. First, many AI problems can be viewed as function optimization problems 
whose underlying general function can be approximated via an ANN. Second, neu¬ 
roevolution is a method grounded in biological metaphors and evolutionary theory 
and inspired by the way brains evolve ll567l . 

This evolutionary (reinforcement) learning approach is applicable either when 
the error function available is not differentiable or when target outputs are not avail- 
able. The former may occur, for instance, when the activation functions employed 
in the ANN are not continuous and, thus, not differentiable. (This is a prominent 
phenomenon, for instance, in the compositional pattern producing networks 06531 .) 
The latter may occur in a domain for which we have no samples of good (or bad) 
behavior or it is impossible to define objectively what a good behavior might be. In- 
stead of backpropagating the error and adjusting the ANN based on gradient search, 
neuroevolution designs ANNs via metaheuristic (evolutionary) search. In contrast to 
supervised learning, neuroevolution does not require a dataset of input-output pairs 
to train ANNs. Rather, it requires only a measure of a ANN’s performance on the 
problem under investigation, for instance, the score of a game playing agent that is 
controlled by an ANN. 

The core algorithmic steps of neuroevolution are as follows: 


1. A population of chromosomes that represent ANNs is evolved to optimize 
a fitness function that characterizes the utility (quality) of the ANN repre- 
sentation. The population of chromosomes (ANNs) is typically initialized 
randomly. 

2. Each chromosome is encoded into an ANN which is, in tum, tested on the 
task under optimization. 

3. The testing procedure assigns a fitness value for each ANN of the popula¬ 
tion. The fitness of an ANN defines its measure of performance on the task. 

4. Once the fitness values for all genotypes in the current population are deter- 
mined, a selection strategy (e.g., roulette-wheel, tournament) is applied to 
pick the parents for the next generation. 

5. A new population of offspring is generated by applying genetic operators 
on the selected ANN-encoded chromosomes. Mutation and/or crossover are 
applied on the chromosomes in the same way as in any evolutionary algo¬ 
ri thm. 

6. A replacement strategy (e.g., steady-state, elitism, generational) is applied 
to determine the final members of the new population. 
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7. Similarly to a typical evolutionary algorithm, the generational loop (steps 2 
to 6) is repeated until we exhaust our computational budget or we are happy 
with the obtained fitness of the current population. 


Typically there are two types of neuroevolution approaches; those that consider 
the evolution of a network’ s connection weights only and those that evolve both the 
connection weights and the topology of the network (including connection types 
and acti vati on functions). In the former type of neuroevolution, the weight vector is 
encoded and represented genetically as a chromosome; in the latter type, the genetic 
representation includes an encoding of the ANN topology. Beyond simple MLPs, 
the ANN types that have been considered for evolution include the NeuroEvolution 
of Augmenting Topologies (NEAT) M655II and the compositional pattern producing 
networks II653L 

Neuroevolution has found extensive use in the games domain in roles such as 
those of evaluating the state-action space of a game, selecting an appropriate ac- 
tion, selecting among possible strategies, modeling opponent strategies, generating 
content, and modeling player experience II567I . The algorithm’s efficiency, scalabil- 
ity, broad applicability, and open-ended learning are a few of the reasons that make 
neuroevolution a good general method for many game AI tasks M567I . 

2.8.1.1 Neuroevolution for Ms Pac-Man 

One simple way to implement neuroevolution in Ms Pac-Man is to hrst design an 
ANN that considers the game state as input and output actions for Ms Pac-Man. 
The weights of the ANN can be evolved using a typical evolutionary algorithm 
and following the steps of neuroevolution as described above. The htness of each 
ANN in the population is obtained by equipping Ms Pac-Man with each ANN in the 
population and letting her play the game for a while. The performance of the agent 
within that simulation time (e.g., the score) can determine the htness value of the 
ANN. Eigure [TTS] illustrates the steps of ANN encoding and htness assignment in 
this hypothetical implementation of neuroevolution in Ms Pac-Man. 


2.8.2 TD Learning with ANN Function Approximators 

Reinforcement learning typically uses tabular representations to store knowledge. 
As mentioned earlier in the RL section, representing knowledge this way may drain 
our available computational resources since the size of the look-up table increases 
exponentially with respect to the action-state space. The most popular way of ad- 
dressing this challenge is to use an ANN as a value (or Q value) approximator, 
thereby replacing the table. Doing so makes it possible to apply the algorithm to 
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Fig. 2.18 Neuroevolution in Ms Pac-Man. The figure visualizes step 2 (ANN encoding) and step 
3 (fitness assignment) of the algorithm for assigning a fitness value to chromosome 2 in the popu- 
lation (of size P). In this example, only the weights of the ANN are evolved. The n weights of the 
chromosome are first encoded in the ANN and then the ANN is tested in Ms Pac-Man for a number 
of simulation steps (or game levels). The resuit of the game simulation determines the fitness value 
(/ 2 ) of the ANN. 


larger spaces of action-state representations. Further, an ANN as a function approx¬ 
imator of Q, for instance, can handle problems with continuous state spaces which 
are infinitely large. 

In this section, we outline two milestone examples of algorithms that utilize the 
ANN universal approximation capacity for temporal difference learning. The al¬ 
gorithms of TD-Gammon and deep Q network have been applied, respectively, to 
master the game of backgammon and play Atari 2600 arcade games at super-human 
level. Both algorithms are applicable to any RL task beyond these particular games, 
but the games that made them popular are used to describe the algorithms below. 


2.8.2.1 TD-Gammon 

Arguably one of the most popular success stories of AI in games is that of Tesauro’s 
TD-Gammon Software that plays backgammon on the grandmaster-level 068911 . The 
learning algorithm was a hybrid combination of an MLP and a temporal difference 
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variant named TD(A); see Chapter 7 of II672I for further details on the TD(A) algo¬ 
ri thm. 

TD-Gammon used a Standard multilayer neural network to approximate the value 
function. The input of the MLP was a representation of the current state of the board 
(Tesauro used 192 inputs) whereas the output of the MLP was the predicted proba- 
bility of winning given the current state. Rewards were dehned as zero for all board 
States except those on which the game was won. The MLP was then trained itera- 
tively by playing the game against itself and selecting actions based on the estimated 
probability of winning. Each game was treated as a training episode containing a 
sequence of positions which were used to train the weights of the MLP by back- 
propagating temporal difference errors of its output. 

TD-Gammon 0.0 played about 300,000 games against itself and managed to 
play as well as the best backgammon computer of its time. While TD-Gammon 
0.0 did not win the performance horse race, it gave us a first indication of what is 
achievable with RL even without any backgammon expert knowledge integrated in 
the AI algorithm. The next iteration of the algorithm (TD-Gammon 1.0) naturally 
incorporated expert knowledge through specialized backgammon features that al- 
tered the input of the MLP and achieved substantially higher performance. From 
that point onwards the number of hidden neurons and the number of self-payed 
games determined greatly the version of the algorithm and its resulting capacity. 
From TD-Gammon 2.0 (40 hidden neurons) to TD-Gammon 2.1 (80 hidden neu¬ 
rons) the performance of TD-Gammon gradually increased and, with TD Gammon 
3.0 (160 hidden neurons), it reached the playing strength of the best human player 
in backgammon II689I . 


2.8.2.2 Deep Q Network 

While the combination of RL and ANNs results in very powerful hybrid algorithms, 
the performance of the algorithm traditionally depended on the design of the in¬ 
put space for the ANN. As we saw earlier, even the most successful applications 
of RL such as the TD-Gammon agent managed to reach human-level playing per¬ 
formance by integrating game specihc features in the input space, thereby adding 
expert knowledge about the game. It was up until very recently that the combination 
of RL and ANNs managed to reach human-level performance in a game without 
considering ad-hoc designed features but rather discovering them merely through 
learning. A team from Google’s DeepMind II464I developed a reinforcement learn- 
ing agent called deep Q network (DQN) that trains a deep convolutional ANN via 
Q-leaming. DQN managed to reach or exceed human-level playing performance in 
29 out of 46 arcade (Atari 2600) games of the Arcade Learning Environment HOl it 
was trained on II464L 

DQN is inspired by and based upon TD-Gammon since it uses an ANN as the 
function approximator for TD learning via gradient descent. As in TD-Gammon, the 
gradient is calculated by backpropagating the temporal difference errors. However, 
instead of using TD(A) as the underlying RL algorithm, DQN uses Q-learning. Fur- 
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ther, the ANN is not a simple MLP but rather a deep convolutional neural network. 
DQN played each game of ALE for a large amount of frames (50 million frames). 
This amounts to about 38 days of playing time for each game 146411 . 

The DQN analyses a sequence of four game screens simultaneously and approx¬ 
imales the future game score per each possible action given its current state. In par- 
ticular, the DQN uses the pixels from the four most recent game screens as its inputs, 
resulting in ANN input size of 84 x 84 (screen size in pixels) x4. No other game- 
specific knowledge was given to the DQN beyond the screen pixel information. The 
architecture used for the convolutional ANN has three hidden layers that yield 32 
20 X 20, 64 9 X 9 and 64 7 x 7 feature maps, respectively. The first (low-level) lay¬ 
ers of the DQN process the pixels of the game screen and extract specialized visual 
features. The convolutional layers are followed by a fully connected hidden layer 
and an output layer. Each hidden layer is followed by a rectifier nolinearity. Given 
a game state represented by the network’s input, the outputs of the DQN are the es- 
timated optimal action values (optimal Q-values) of the corresponding state-action 
pairs. The DQN is trained to approximate the Q-values (the actual score of the game) 
by receiving immediate rewards from the game environment. In particular, the re- 
ward is 4-1 if the score increases in between two successive time steps (frames), it 
is — 1 if the score decreases, and 0 otherwise. DQN uses an e-greedy policy for its 
action-selection strategy. It is worth mentioning that, at the time of writing, there 
are newer and more efficient implementations of the deep reinforcement leaming 
concept such as the Asynchronous Advantage Actor-Critic (A3C) algorithm 04631 . 


2.8.2.3 TD Learning with ANN Function Appoximator for Ms Pac-Man 

We can envisage a DQN approach for controlling Ms Pac-Man in a similar fashion 
to that with which ALE agents were trained 14641 . A deep convolutional neural net¬ 
work scans the level image on a pixel-to-pixel basis (see Eig. |2.19| ). The image goes 
through a number of convolution and fully connected layers which eventually feed 
the input of an MLP that outputs the four possible actions for Ms Pac-Man (keep 
direction, move backwards, turn left, tum right). Once an action is applied, the score 
of the game is used as the immediate reward for updating the weights of the deep 
network (the convolutional ANN and the MLP). By playing for a sufficient time pe- 
riod the controller gathers experience (image snapshots, actions, and corresponding 
rewards) which trains the deep ANN to approximate a policy that maximizes the 
score for Ms Pac-Man. 


2.8.3 Further Reading 


Eor a recent thorough survey on the application of neuroevolution in games the 
reader may refer to II567I . Eor a complete review of neuroevolution please refer to 
Eloreano et al. 12051 . CPPNs and NEAT are covered in detail in 06531 and 06551 
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Fig. 2.19 A deep Q-leaining approach for Ms Pac-Man. Following (46l, the network’s first part 
contains a set of convolution layers which are followed by rectifler nonlinearities. The flnal layers 
of the DQN we present in this example are fully connected employing ReLUs, as in l464l . 


respectively. TD-Gammon and DQN are covered in detail in II689I and II464I . re- 
spectively. Both are also placed within the greater RL field in the upcoming second 
edition of 16721 . Details about the A3C algorithm can be found in 14631 and imple- 
mentations of the algorithm can be found directly as part of Tensorflow. 


2.9 Summary 

This chapter covered the AI methods we feel the reader of this book needs to be 
familiar with. We expect, however, that our readers have a basic background in AI 
or have completed a course in fundamentais of AI prior to reading this book. Hence, 
the algorithms were not covered in detail since the emphasis of this book is on 
the application of AI within the domain of games and not on AI per se. On that 
basis, we used the game of Ms Pac-Man as the overarching application testbed of 
all algorithms throughout this chapter. 

The families of algorithms we discussed include traditional ad-hoc behavior au- 
thoring methods (such as finite state machines and behavior trees), tree search (such 
as best-first, Minimax and Monte Carlo tree search), evolutionary computation (such 
as local search and evolutionary algorithms), supervised learning (e.g., neural net- 
works, support vector machines and decision trees), reinforcement learning (e.g., 
Q-leaming), unsupervised learning (such as clustering and frequent pattern min- 
ing), and hybrid algorithms such as evolving artificial neural networks and artificial 
neural networks as approximators of expected rewards. 

With this chapter we reached the end of the first, introductory, part of the book. 
The next part begins with a chapter on the most traditional and widely explored task 
of AI in games; playing! 
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Chapter 3 

Playing Games 


When most people think of AI in games they think of an AI playing the game, or 
controlling the non-player characters you meet in the game. This might be because 
of the association between AI and the idea of autonomous action, or the association 
between game characters and robots. While playing games is far from the only in- 
teresting application for AI in games, it is a very important one and the one with 
the longest history. Many methods for content generation (Chapter]^ and player 
modeling (Chapter]^ are also dependent on methods for playing games, and there- 
fore it makes sense to discuss playing games before content generation and player 
modeling. 

This chapter is devoted to AI methods for playing games, including methods for 
creating interesting non-player characters in games. While winning a game, appear- 
ing human-like, and providing entertainment are very different objectives, they face 
many of the same challenges. In fact, there are many different reasons why one 
might want to use AI methods to play a game. We start the chapter with discussing 
these various motivations (Section llD- Regardless of why you want to use AI to 
play a game, which methods you can effectively use to play the game is determined 
by the various characteristics of the game that, in turn, affect the choice and de- 
sign of the AI method. So the next section in this chapter (Section [3.2| i is devoted 
to characterizing games and AI algorithms according to several criteria. Once you 
have understood your game sufficiently, you can make an informed choice of which 
algorithm to play it. The following section (Section [33] l is devoted to discussing the 
various methods that can be used to play games, and how the right choice of method 
depends on the characteristics of the game. Most of the methods discussed here will 
have been briefly and somewhat abstractly discussed in Chapter]^ but this chapter 
will go into some depth about the application of these methods to playing games. 

Next, a long section (Section [34| divides up the space of games by game genre, 
and discusses how AI methods can be applied in various types of games. This sec¬ 
tion will contain plenty of examples from the literature, and some from published 
games. This section also introduces several commonly used game-based frame- 
works and competitions for testing AI game-playing algorithms. Throughout the 
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chapter we will mostly discuss the use of AI methods to play to win, but also make 
numerous references to the experience-creation aspect of game-playing. 


3.1 Why Use AI to Play Games? 

The question of why you might want to deploy some kind or artihcial intelligence 
to play a game can be reduced to two more specihc questions; 


Is the AI playing to win ? 

The question here is whether achieving as high a performance as possible in the 
game is the overarching goal of the AI method. High performance here means get- 
ting a high score, winning over the opponent, surviving for a long time or similar. 
It is not always possible to dehne what high performance and “playing to win” 
means—for example, The Sims (Electronic Arts, 2000) has no ciear winning state 
and the winning condition in Minecraft (Mojang, 2011) is not strongly related to 
playing the game well—but in a very large number of games, from Tetris (Alexey 
Pajitnov and Vladimir Pokhilko, 1984) to Go to the Halo (Microsoft Studios, 2001- 
2015) series, it is straightforward to dehne what playing better means. However, 
not all players play to win, and few players play to win in every game all the time. 
Players play to pass time, relax, test new strategies, explore the game, role-play, 
keep their friends company and so on (see a more detailed discussion on this topic 
in Chapter]^. An AI algorithm might likewise be used in a number of roles beyond 
simply playing as well as possible. For example, the agent might play in a human- 
like manner, play in an entertaining manner, or behave predictably. It is important to 
note that optimizing an agent for playing a game to win might be at odds with some 
of the other ways of playing: many high-performing AI agents play in distinctly 
non-human, boring and/or unpredictable ways, as we will see in some case studies. 


Is the AI taking the role of a human player? 

Some games are single-player, and some games are multi-player where all players 
are human. This is particularly true for classic board games. But many, probably 
most, video games include various non-player characters. These are controlled by 
the computer Software in some way—in fact, for many game developers “game AI” 
refers to the program code that Controls the NPCs, regardless of how simple or so- 
phisticated that code is. Obviously, the role of NPCs varies sharply between games, 
and within games. In the discussion of this chapter we refer to non-player roles as 
those that a human could not take, or would not want to take. Thus, all roles in an 
exclusively multi-player hrst-person shooter (EPS) such as Counter-Strike (Valve 
Corporation, 2000) are player roles, whereas a typical single-player role-playing 
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Fig. 3.1 Why use AI to play games? The two possible goals (win, experience) AI can aim for and 
the two roles (player, non-player) AI can take in a gameplaying setting. We provide a summary of 
motivations and some indicative examples for each of the four AI uses for gameplaying. 


game (RPG) such The Elder Scrolls V: Skyrim (Bethesda Softworks, 2011) has only 
one player role, the rest are non-player characters. In general, non-player roles have 
more limited possibilities than player roles. 

In summary, AI could be playing a game to win or for the experience of play 
either by taking the role of the player or the role of a non-player character. This 
yields four core uses of AI for playing games as illustrated in Fig. |3.1| With these 
distinctions in mind, we will now look at these four key motivations for building 
game-playing AI in further detail. 


3.1.1 Playing to Win in the Player Role 

Perhaps the most common use of AI together with games in academic settings is 
to play to win, while taking the role of a human player. This is especially common 
when using games as an AI testbed. Games have been used to test the capabilities 
and performance of AI algorithms for a very long time, as we discussed in Sec- 
tion |1.2| Many of the milestones in AI and games research have taken the form of 
some sort of AI program beating the best human player in the world at some games. 
See, for example, IBM’s Deep Blue winning over Garry Kasparov in Chess, Google 
DeepMind’s AlphaGo winning over Lee Sedol M629II and Ke Jie in Go, and IBM’s 
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Watson winning Jeopardy! II201I . Ali of these were highly publicized events widely 
seen as confirmations of the increasing capabilities of AI methods. As discussed in 
Chapter[2 AI researchers are now increasingly turning to video games to find appro- 
priate challenges for their algorithms. The number of active competitions associated 
with the IEEE CIG and AUDE conferences is testament to this, as is DeepMind’s 
and Eacebook AI Research’s choice of StarCraft II (Blizzard Entertainment, 2015) 
as a testbed for their research. 

Games are excellent testbeds for artificial intelligence for a number of reasons, 
as elaborated on in Section [T73| An important reason is that games are made to test 
human intelligence. Well-designed games exercise many of our cognitive abilities. 
Much of the fun we have in playing games comes from learning the games through 
playing them 03511 . meaning that well-designed games are also great teachers. This, 
in turn, means that they offer the kind of gradual skill progression that allows for 
testing of AI at different capability levels. 

There are some reasons besides AI benchmarking for why you might want to use 
an AI in the place of a human to play games to win. Eor example, there are some 
games where you need strong AI to provide a challenge to players. This includes 
many strategic games of perfect Information, such as classic board games, including 
Chess, Checkers and Go. However, for games with hidden information, it is often 
easier to provide challenge by simply “cheating”, for example, by giving the AI 
player access to the hidden state of the game or even by modifying the hidden state 
so as to make it harder to play for the human. Eor example, in the epic strategy game 
Civilization (MicroProse, 1991), all civilizations can be played by human players. 
However, playing any Civilization game well under the same conditions as a human 
is very challenging, and there is, to our knowledge, no AI capable of playing these 
games as well as a good human player. Therefore, when playing against several 
computer-controlled civilizations, the game typically cheats by providing these with 
preferential conditions in various ways. 

Another use case for AI that plays to win in a player role is to test games. When 
designing a new game, or a new game level, you can use a game-playing agent 
to test whether the game or level is playable, so called simulation-based testing. 
However, in many cases you want the agent to also play the game in a human-like 
manner to make the testing more relevant; see below on playing for experience. 

Historically, the use of AI to play to win in a player role has been so dominant in 
academic work that some researchers have not even considered other roles for AI in 
playing games. In game development, on the other hand, this particular motivation 
for game-playing AI is much more rare; most game-playing AI in existing games 
is focused on non-player roles and/or playing for experience. This mismatch has 
historically contributed to the lack of understanding between academia and industry 
on game AI. In recent years however, there has been a growing understanding of the 
multitude of motivations for game-playing AI. 
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3.1.2 Playing to Win in a Non-player Role 

Non-player characters are very often designed to not offer maximum challenge 
or otherwise be as effective as possible, but instead to be entertaining or human- 
like; see below for non-player characters playing for experience. However, there 
are instances when you want a non-player character to play as well as possible. As 
mentioned above, strategy games such as Civilization (MicroProse, 1991) have an 
(unanswered) need for high-performing non-cheating opponents, though here we 
are talking about playing roles that other human players could in principle have 
taken. Other strategy games, such as XCOM: Enemy Unknown (2K Games, 2012), 
have playing roles that humans would not play, creating a need for NPC AI playing 
to win. 

Other times, creating an NPC playing to win is a necessary precursor to creating 
an NPC playing for experience. For example, in a racing game, you might want to 
implement “rubber band AI” where the NPC cars adapt their speed to the human 
player, so that they are never too far behind or ahead. Doing this is easy, but oniy 
if you already have an AI controller that can play the game well, either through 
actually playing the game well or through cheating in a way that cannot easily be 
detected. The performance of the controller can then be reduced when necessary so 
as to match the player’s performance. 


3.1.3 Playing for Experience in the Player Role 

Why would you want an agent that takes the role of a human player, but that does not 
focus on winning? For example, when you want a human-like agent. Perhaps the 
most important reason for such agents is alluded to above: simulation-based testing. 
This is important both when designing games and game content manually, and when 
generating content procedurally; in the latter case, the quality of the game content 
is often evaluated automatically with the help of an agent playing the game, as dis- 
cussed in Chapter]^ When trying to see how the game would be played by a human 
it is therefore important that the agent plays in a human-like manner, meaning that 
it has performance comparable to a human, has similar reaction speed, makes the 
same sort of mistakes that a human would do, is curious about and explores the same 
areas as a human would, etc. If the AI agent plays significantly differently from how 
a human would play, it might give the wrong information about e.g., whether a 
game is winnable (it might be winnable but only if you have superhuman reflexes) 
or whether a game mechanic is used (maybe a human would use it, but not an AI 
that tries to play optimally). 

Another situation where human-like play is necessary is when you want to 
demonstrate how to play a level to a human player. A common feature of games 
is some kind of demo mode, which shows the game in action. Some games even 
have a demonstration feature built into the core gameplay mode. For example, New 
Super Mario Bros (Nintendo, 2006) for the Nintendo Wii will show you how to play 
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a paiticular part of a level if you fail it repeatedly. The game simply takes over the 
Controls and plays for you for about 10 to 20 seconds, and lets you continue after- 
wards. If all the level content is known beforehand, and there are no other players, 
such demonstrations can be hardcoded. If some parts of the game are user-designed, 
or procedurally generated, the game needs to generate these demonstrations itself. 

Playing in a “human-like” fashion may seem a rather fuzzy and subjective aim, 
and it is. There are many ways in which a typical AI agent plays differently from 
a typical human player. How humans and Ais differ depends on the algorithm used 
to play the game, the nature of the game itself, and a multitude of other factors. To 
investigate these differences further, and spur the development of agents that can 
play in a human-like manner, two different Turing test-like competitions have been 
held. The 2K BotPrize was held from 2008 to 2013, and challenged competitors 
to develop agents that could play the FPS Unreal Tournament 2004 (Epie Games, 
2004) in such a way that human participants thought that the bots were human 02631 
1262116471 . Similarly, the Turing test track of the Mario AI Competition let people 
submit playing agents of Super Mario Bros (Nintendo, 1985), who were judged by 
human onlookers as to whether they were human or not 0619117171 . While it takes 
us too far to go through all of the results of these competitions here, there are some 
very obvious signs of non-humanness that recur across games for many types of AI 
agents. These include having extremely fast reactions, switching between actions 
faster than a human could, not attempting actions which fail (because of having a 
too good model of the outcome of actions), not doing unnecessary actions (such as 
jumping when one could just be running) and not hesitating or stopping to think. 

Of course, not all players of a game play in the same way. In fact, as discussed 
further in Chapter]^ if one analyzes a set of play traces of any game one can often 
find a number of player “archetypes” or “personas”, clusters of players who play the 
game in markedly different ways in terms of e.g., aggression, speed, curiosity and 
skill. Within work on AI that plays games in human-like styles, there has been work 
both on learning and mirroring the playstyle of individual players II422114231151 II 
I3281l603ll and on learning to play games in the style of one of several personas 02671 
1 ^ . 


3.1.4 Playing for Experience in a Non-player Role 

Almost certainly the most common goal for game-playing AI in the game industry 
is to make non-player characters act, almost always in ways which are not primarily 
meant to beat the player or otherwise “win” the game (for many NPCs it may not 
even be defined what winning the game means). NPCs may exist in games for many, 
sometimes overlapping, purposes: to act as adversaries, to provide assistance and 
guidance, to form part of a puzzle, to teli a story, to provide a backdrop to the 
action of the game, to be emotively expressive and so on 07241 . The sophistication 
and behavioral complexity of NPCs likewise vary widely, from the regular left-right 
movements of the aliens in Space Invaders (Midway, 1978) and Koopas in the Super 
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Mario Bros (Nintendo, 1985-2016) series to the nuanced and varied behavior of 
non-player characters in Bioshock Infinite (2K Games, 2013) and the alien 'mAlien: 
Isolation (Sega, 2014). 

Depending on the role of the NPC, very different tasks can be asked of the AI 
algorithms that control it. (It can certainiy be argued that many of the Scripts that 
control NPCs cannot truthfully be described as artificial intelligence in any con- 
ventional way, but we will stick with the acronym here as it is commonly used for 
all code that Controls non-player characters in the game industry.) In many cases 
what the game designers look for is the illusion of intelligence: for the player to 
believe that the NPC in some sense is intelligent even though the code controlling it 
is very simple. Human-likeness in the sense discussed in the previous section might 
or might not be the objective here, depending on what kind of NPC it is (a robot or 
a dragon should perhaps not behave in a too human-like manner). 

In other cases, the most important feature of an NPC is its predictability. In a 
typical stealth game, a large part of the challenge is for the player to memorize and 
predict the regularities of guards and other characters that should be avoided. In such 
cases, it makes sense that the patrols are entirely regular, so that their schedule can 
be gleaned by the player. Similarly, the boss monsters in many games are designed 
to repeat certain movements in a sequence, and are only vulnerable to the player’s 
attack when in certain phases of the animation cycle. In such cases, too “intelligent” 
and adaptive behavior would be incompatible with the game design. 

It should be noted that even in cases where you would expect to need supple, 
complex behavior from NPCs, an agent that plays to win might be very problem- 
atic. Many high-performing strategies are seen as very boring by the player, and 
prime examples of “unsportsmanlike” behavior. For example, in an experiment with 
building high-performing AI for a turn-based strategy game, it was found that one 
of the Solutions (based on neuroevolution) was extremely boring to play against, 
as it simply took a defensive position and attacked any incoming units with long- 
distance attacks 04901 . Similarly, camping (staying stationary in a protected position 
and waiting for enemies to expose themselves to hre) is a behavior that is generally 
frowned on and often banned in FPS games, but it is often highly effective and easy 
to leam for an AI (incidentally, real-life military training often emphasizes camping- 
like tactics—what is effective is often not fun). Another interesting example is the 
Work by Denzinger et al. 01651 in which an evolutionary algorithm found that the 
best way to score a goal in FIFA 99 (Electronic Arts, 1999) was by forcing a penalty 
kick. The evolutionary process found a local optimum in the htness landscape cor- 
responding to a sweet spot or exploit of the game’s mechanics which yielded highly 
efficient, yet predictable and boring gameplay. Exploiting the game’s bugs for win- 
ning is a Creative strategy that is not only followed by AIs but also by human players 
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3.1.5 Summary of AI Game-Playing Goals and Roles 

We argued above that playing to win in the player role has been overemphasized, to 
the point of neglecting other perspectives, in much of academic research. In the same 
way, Work on AI in the game industry has generally overemphasized playing for 
experience in a non-player role, to the point of neglecting other perspectives. This 
has led to an emphasis in the industry on behavior authoring methods such as finite 
state machines and behavior trees, as AI methods based on search, optimization 
and leaming have been seen as not conducive to playing for experience; a common 
gripe has been a perceived lack of predictability and authorial control with such 
methods. However, given a better understanding of what roles AI can be used to 
play in games, this neglect of methods and perspectives is hopefully coming to an 
end in both academia and industry. 


3.2 Game Design and AI Design Considerations 


When choosing an AI method for playing a particular game (in any of the roles 


discussed in Secti on 3.1 1 it is crucial to know the characteristics of the game you 
are playing and the characteristics of the algorithms you are about to design. These 
collectively determine what type of algorithms can be effective. In this section we 
first discuss the challenges we face due to the characteristics of the game per se 
(Section 3.2. l| l and then we discuss aspects of AI algorithmic design (Section 3.2.2 1 
that need to be considered independently of the game we examine. 


3.2.1 Characteristics of Games 

In this section we discuss a number of characteristics of games and the impact they 
have on the potential use of AI methods. All characteristics covered are tied to the 
design of the game but a few (e.g., input representation and forward model) are also 
dependent on the technical implementation of the game and possibly amenable to 
change. Much of our discussion is inspired by the book Characteristics of Games 
by Elias et al. 11921 . which discusses many of these factors from a game design 
perspective. For illustrative purposes, Fig. |3.2| places a number of core game ex- 
amples onto the three-dimensional space of observability, stocbasticity and time 
granularity. 


3.2.1.1 Number of Players 

A good place to start is the number of players a game has. Elias et al. Ifl92l distin- 
guish between; 
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Observability 



Fig. 3.2 Characteristics of games: game examples across the dimensions of stochasticity, observ¬ 
ability and time granularity. Note that the game examples presented are sorted by complexity (ac- 
tion space and branching factor) within each cube. Minimax can theoretically solve merely any 
deterministic, tum-based game of perfect Information (red cube in the flgure)—in practice, it is 
stili impossible to solve games with substantially large branching factors and action spaces such as 
Go via Minimax. Any AI method that eventually approximates the Minimax tree (e.g., MCTS) can 
be used to tackle imperfect information, non-determinism and real-time decision making (see blue 
cubes in figure). Strictly speaking, Super Mario Bros (Nintendo, 1985) involves a small degree of 
non-determinism only when a player helps creating a particular scene; we can, thus, safely classify 
the game as deterministic nm. 


• single-player games, such as puzzles and time-trial racing; 

• one-and-a-half-player games, such as the campaign mode of an FPS with non- 
trivial NPCs; 

• two-player games, such as Chess, Checkers and Spacewar! (Russell, 1962); and 

• multi-player games, such as League ofLegends (Riot Games, 2009), the Mario 
Kart (Nintendo, 1992-2014) series and the Online modes of most FPS games. 

The distinction between single-player and one-and-a-half-player games is not a 
Sharp one—there is no ciear boundary for how advanced NPCs should be to count 
as a “half player”. In the case of multi-player games, many can be played with only 
two players, at which point they are effectively two-player games. It is not always 
the case that other players (or NPCs) are adversarial and try to stop the player— 
there are many collaborative games, or games where relations between players are 
complex and have elements of both competition and cooperation. Stili, keeping the 
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number of players in mind is very useful when thinking about algorithms for playing 
games. 

When using tree search algorithms to play games, some algorithms fit particu- 
larly well with some numbers of players. Standard single-agent tree search algo¬ 
rithms such as breadth-first, depth-first and A* fit the single-player case particularly 
well (including games which have NPCs, but where those NPCs are so simple and 
predictable as to be treated as part of the environment). In such games, what happens 
in the game is determined entirely by the actions the player takes and any potential 
random effects; there are no other “intentional” players. This fits very well with 
single-agent tree search algorithms, which are based on the Markov property, that 
the next state is entirely determined by the previous state and the action taken at that 
point. 

A particular case is two-player zero-sum adversarial games, i.e., there are ex- 
actly two players; one player will win, the other will lose (or perhaps there will be 
a draw). We do not know what the other player will do, but we can safely assume 
that she will do everything she can to win, and thereby deny you the victory. The 
Minimax algorithm (with or without a-j3 pruning) is perfectly suited to this case, 
and will lead to optimal play given sufficient computation time. 

But how do we cope with the challenge when we have many players, or perhaps 
a single player surrounded by very complicated non-player agents? While it is the- 
oretically possible to expand Minimax to multiple players, this only works if there 
can be no collusion (or alliances of any kind) between players and the zero-sum na¬ 
ture of the game stili remains (which it usually does not). Further, the computational 
complexity of Minimax quickly gets unmanageable with more than two players, as 
for every move you take you do not just need to consider the countermove of one 
player, but of all players. So this approach is rarely workable. 

It is more common in the multi-player case to treat the game as a single-player 
game, but use some kind of model of what the other players do. This could be an 
assumption that the other players will oppose the player, a learned model based on 
observed behavior or even a random model. With an appropriate model of what the 
other players will do, many Standard single-player game-playing methods can be 
used in multi-player settings. 


3.2.1.2 Stochasticity 

A common way in which many games violate the Markov property is by being 
stochastic (or non-deterministic). In many games, some of what happens is ran¬ 
dom. As Standard digital computer architectures do not allow for “true” random- 
ness, effective randomness is provided by pseudo-random number generators. The 
Word “stochastic” is used to denote processes which cannot be practically predicted, 
whether they resuit from true randomness or complex calculations. Games can have 
varying amounts of stochasticity, from completely deterministic games like Chess 
to games dominated by stochastic outcomes like Roulette, Ludo, Yahtzee or even 
Monopoly. It is common for games to have mostly or fully deterministic game 
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mechanics combined with some stochastic element through card-drawing, dice- 
throwing or some similar mechanism to reduce the possibility of planning. However, 
stochasticity can occur in essentially any part of a game. 

In a game with stochasticity, the outcome of the game is not entirely determined 
by the actions the players take. In other words, if you play several playthroughs 
of the same game, taking the same actions at the same points in time, you are not 
guaranteed the same outcome. This has consequences for AI algorithms. For tree 
search algorithms, it means that we cannot be sure about the state which a sequence 
of actions will lead to, and therefore about the results of the algorithm. This leads to 
problems with using many tree search algorithms in their canonical forms, and re¬ 
quires that we add some modifications to address the non-deterministic uncertainty 
in the forward model. For example, in Monte Cario tree search modifications such 
as determinization are used, where the different possible outcomes of each action 
are explored separately fTH . While these algorithm variations can be effective, they 
generally increase the computational complexity of the base algorithm. 

For reinforcement learning approaches, including evolutionary reinforcement 
learning, it means that we have reduced certainty in exactly how good a given strat- 
egy/policy is—a good policy may achieve bad outcomes, or a bad policy good 
outcomes, because of random events in the game. Such outcome uncertainty can 
be mitigated by evaluating every policy multiple times, though this has significant 
computational cost. On the other hand, stochasticity can sometimes actually be an 
advantage when learning policies: a policy which is learned for a stochastic game 
may be more robust than one learned for a deterministic game, as in the latter case 
it is possible to learn a very brittle policy that only works for a specific configura- 
tion of the game. For example, learning a policy which attacks enemies in specihc 
places at specific times, rather than being able to handle enemies that might arrive 
from any direction at any time. 

While it is very common for digital games to include some form of stochas¬ 
ticity, an interesting case is very early games hardware such as the 1977-vintage 
Atari 2600, which does not have the facilities for implementing pseudo-random 
number generators (mainly because it lacks a system clock). If a player takes ex¬ 
actly the same actions at exactly the same times (including the key press that starts 
the game), exactly the same outcome will be achieved. The Arcade Learning En- 
vironment is a widely used game-based Al-benchmark built around an emulator of 
the Atari 2600 EOl. When training AI agents to play games with no stochasticity, 
it is entirely possible to learn brittle policies that effectively bypass the complexi- 
ties of the full game (whether this actually happens, or whether most agents leam 
more general strategies, is an open question). As we already saw across the various 
examples of Chapterj^Ms Pac-Man (Namco, 1982) is arguably the most popular 
non-deterministic arcade game of that era. 
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3.2.1.3 Observability 

Observability is a characteristic that is strongly related to stochasticity. It refers to 
how much information about the game state is available to the player(s). At one 
extreme we have classic board games such as Chess, Go and Checkers, where the 
full board state is always available to the players, and puzzles such as Sudoku and 
Spelltower. These games have perfect information. At the other extreme we can 
think of classic text adventures such as Zork (Personal Software, 1980) or Colossal 
Cave Adventure, where initially very little of the world and its state is revealed to 
the player and much of the game is about exploring what the world is like. Those 
games have bidden information and therefore only partial observability. Many, 
if not most, computer games have significant hidden information: think of a typical 
platform game such as Super Mario Bros (Nintendo, 1985), or FPS such as the Halo 
series (Microsoft Studios, 2001-2015), where at any point you can only perceive a 
small part of the game world. Within computer strategy games such as StarCraft 
II (Blizzard Entertainment, 2015) or Civilization (MicroProse, 1991) the common 
term for hidden information is/og ofwar. Even many classic non-digital games have 
hidden information, including most card games where players keep their hands of 
cards private (such as Poker) and board games such as Battleship. 

When developing an AI agent for a game with hidden information, the simplest 
approach you can follow is to merely ignore the hidden information. At each point in 
time, just feed the available information to the agent and use that to decide the next 
action. Doing so actually works quite well in some games, notably action-focused 
games with linear levels; for example, simple Super Mario Bros (Nintendo, 1985) 
levels can be played well based on only instantaneously available information 07061 . 
However, if you play a strategy game such as StarCraft (Blizzard Entertainment, 
1998) based on only the available information, you are not even going to see the 
enemy until it is too late—good play involves active information gathering. Even 
in Super Mario Bros (Nintendo, 1985), complicated levels that feature backtracking 
require remembering off-screen parts of the level 03221 . In a trick-taking card game 
such as Poker the available information (your own hand) is actually of compara- 
tively less relevance; the core of the game is modeling the hidden information (your 
adversaries’ hands and minds). 

Therefore, effective AI for games with partial observability often requires some 
kind of modeling of the hidden information. Eor some games, notably variants 
of Poker such as Heads-up limit hold’em, considerable research has been done on 
game-specific methods for modeling hidden information that includes opponents’ 
play 1^ . There are also more generic methods of adding some form of hidden 
state modeling to existing algorithms, such as Information Set Monte Carlo tree 
search 11461 . Just like when it comes to methods for dealing with stochasticity, these 
methods typically add considerable computational complexity compared to the base 
algori thm. 
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3.2.1.4 Aetion Space and Branching Factor 

When you play the minimalist-masochist mobile game Flappy Bird (dotGEARS, 
2013), you have a single choice at any point in time: to flap or not to flap. (Flapping 
is accomplished by touching the screen, and makes the protagonist bird rise in the 
air.) Flappy Bird and similar one-button games, such as Canabalt (Beatshapers, 
2009), probably have the lowest possible branching factor. The branching factor 
is the number of different actions you can take at any decision point. The branching 
factor of Flappy Bird is 2: flap or no flap. 

For comparison, Pac-Man (Namco, 1980) has a branching factor of 4: up, down, 
left and right. Super Mario Bros (Nintendo, 1985) has a branching factor of around 
32: eight D-pad directions times two buttons (though you may argue that some of 
these combinations are nonsensical and should not actually be considered). Chess 
has an average branching factor of 35, whereas Checkers has a somewhat lower 
branching factor. Go has a whooping 400 for the very first move; as the board is 
populated, the branching factor decreases, but there are typically a few hundred 
potential positions to put every stone. 

While 400 is a very high branching factor compared to 35 or 2, it dwarfs in com¬ 
parison to many computer strategy games where multiple units can be moved every 
turn. Considering each combination of movements of individual units as an action, 
this means that the branching factor of the game is the product of the branching 
factor of the individual units. If you have 6 different units that can each take 10 
different actions at a given time—a rather conservative estimate compared to typi- 
cal games of, say, StarCraft (Blizzard Fntertainment, 1998) or Civilization (Micro- 
Prose, 1991)—then your branching factor is a million! 

But wait, it gets worse. For many games, it is not even possible to enumerate all 
the actions, as the input space is continuous. Think of any modern first-person game 
played on a computer or console. While it is true that computers do not really capture 
infinities well, and that “continuous” inputs such as computer mice, touchscreens 
and thumbsticks (on e.g., XBox and Playstation controllers) actually return a digital 
number, that number has such fine resolution that it is for all practical purposes 
continuous. The only way to create a practically enumerable set of actions is to 
discretize the continuous input space somehow, and strike a compromise between 
overwhelming branching factors and reducing the input space so much as to not be 
able to play the game effectively. 

The branching factor is a key determinant of the effectiveness of tree search al- 
gorithms. The complexity of the breadth-first algorithm (for single-player games) 
and the Minimax algorithm (for adversarial two-player games) for searching to 
depth d is where b is the branching factor. In other words, a high branching 
factor makes it almost impossible to search more than a few steps ahead. This fact 
has very tangible consequences for which games can be played with tree search 
methods; for example, Go has an order of magnitude higher branching factor than 
Chess, and this was arguably the main reason for the very poor performance of all 
kinds of AI methods on Go for decades (during which the same methods performed 
well on Chess). Monte Carlo tree search handles high branching factors better be- 
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cause it builds imbalanced trees, but it is by no means immune to the problem. Once 
the branching factor gets high enough (say, a million, or maybe a billion, depending 
on the speed of the simulator), it becomes impractical to enumerate even the actions 
at depth 1 and therefore to use tree search at ali. 

High branching factors are an issue for reinforcement leaming algorithms as 
well, including evolutionary reinforcement leaming. Mostly this has to do with the 
controller/policy representation. If you are using a neural network (or some other 
function approximator) for representing your policy, you may need to have outputs 
for each action; alternatively, if you are using the network to assign values to all 
actions, you need to iterate over them. In both cases, a large number of possible 
actions carries a cost. Another problem is the exploration-exploitation dilemma; the 
higher the number of possible actions, the longer it will take to explore them all 
while leaming. 

A final comment on branching factors is that for many games, they are not con¬ 
stant. In Chess, you have fewer moves available at the beginning of the game when 
most of your pieces are blocked, more towards the midgame, and fewer again in 
the endgame when most pieces may be blocked. In a typical RPG such as those in 
the Final Fantasy series (Square Enix, 1987-2016) the number of available actions 
increases as the player character accrues items, spells and other possibilities. As 
mentioned above, the number of available actions in Go decreases as you play. 


3.2.1.5 Time Granularity 

When discussing branching factors above, we talked about the number of possible 
actions to take at any “point in time”. But how often is that? How often can the player 
take an action? A fundamental distinction is that between turn-based and real-time 
games. Most classic board games are turn-based games. In such games, players take 
turns, and at each tum a player can take an action, or a specihed number of actions. 
The amount of real time that passes between turns is generally not of any importance 
inside the game (though tournaments and professional play often incorporate some 
form of time limit). Real-time games include many popular genres of computer 
games, such as EPS, racing games and platformers. Even within real-time games, 
there is considerable variation in how often an in-game action can in practice be 
taken. At the extreme there is the screen update frequency; the current generation of 
video games typically strives to have an update frequency of 60 frames per second 
to ensure a perceived smooth movement, but many games update the screen half as 
often or even less because of the complexity of rendering complicated scenes. In 
practice, the number of actions a player character (or any other in-game character) 
could take per second is usually more limited than that. 

To take two examples far apart on the time granularity scale, let us consider two 
adversarial games; Chess and StarCraft (Blizzard Entertainment, 1998). A game of 
Chess between skilled players on average lasts about 40 turns Q In StarCraft (Bliz- 


* http://chess.stackexchange.coiTi/questions/2506/ 
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zard Entertainment, 1998), a highly competitive real-time strategy (RTS) game, pro- 
fessional players often take three to five actions per second (each action is typically 
executed with a mouse click or a shortcut key). With a typical game lasting 10 to 
20 minutes, this means that thousands of actions are taken in a game. But there are 
not that many more significant events in a game of StarCraft (Blizzard Entertain¬ 
ment, 1998) than in a game of Chess—the lead does not change much more often, 
and grand strategic decisions are not made that much more often. This means that 
the number of actions between significant game events is much higher in StarCraft 
(Blizzard Entertainment, 1998) than in Chess. 

Time granularity affects AI game-playing methods through limiting how far 
ahead you can look. A given depth of search means very different things depending 
on the time granularity of the game. Ten turns in Chess is enough to execute a whole 
strategy; ten actions ahead in StarCraft (Blizzard Entertainment, 1998) might just 
be a few seconds, during which the game might not have changed in any signifi¬ 
cant way. To play StarCraft (Blizzard Entertainment, 1998) well using tree search, 
one would need an exceptional search depth, in the hundreds or thousands of ac¬ 
tions, which would clearly be computationally infeasible. One way to address this 
challenge is to consider macro-actions (e.g., as in 0525115241 '). which are sets or 
sequences of smaller, fine-grained, actions. 


3.2.2 Characteristics of AI Algorithm Design 

In the following, we discuss some important issues in applying AI algorithms to 
games. These are design choices relating not so much to game design (covered in 
the previous section), as to AI algorithm design and the constraints under which 
the algorithm is used. This section expands the discussion about representation and 
utility covered in Chapterj^with a focus on game-playing AI. 


3.2.2.1 How Is the Game State Represented? 

Games differ in what information they present to the player, and how. Text adven- 
tures output text, the state of a classic board game can be described by the positions 
of ali board pieces, and graphical video games serve moving graphics together with 
sound and occasionally outputs such as controller rumble. For digital games, the 
technical limitations of the hardware on which the game is implemented influences 
how it is presented; as processor speed and memory capacity increases, the pixel 
resolution and scene complexity of video games has increased commensurably. 

Importantly, the same game can be represented in different ways, and which way 
it is represented matters greatly to an algorithm playing the game. To take a rac- 
ing game as an example, the algorithm could receive a first-person view out of the 
windscreen of the car rendered in 3D, or an overhead view of the track rendering 
the track and various cars in 2D. It could also simply receive a list of positions and 
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velocities of ali cars on the track in the frame of reference of the track (along with a 
model of the track), or a set of angles and distances to other cars (and track edges) 
in the frame of reference of the track. 

The choices regarding input representation matter a lot when designing a game. 
If you want to learn a policy for driving a car around a track, and the inputs to the 
policy are the three continuous variables associated with speed and distance to the 
left and right edge of the track, learning a decent driving policy is comparatively 
simple. If your input is instead an unprocessed visual feed—i.e., tens of thousands 
of pixel values—finding a decent policy is likely to be much harder. Not only is 
the policy search space in the latter case vastly greater, the proportion of the search 
space that corresponds to decently-performing policies is likely to be much smaller, 
as many more nonsensical policies are possible (e.g., turn left if even-numbered 
pixels are lighter than odd-numbered pixels; this policy does not map to the game 
state in any sensible way, and if it works it is a fluke). In order to learn to drive 
well based on visual input—at least in cases where illumination, roadside scenery, 
etc. vary significantly—you likely need to learn a visual system of some kind. In 
light of this, most naive policies applied to full visual input would likely not have 
fared very well. Continuing the car racing example, even in cases where you have 
very few inputs, how these are represented matters; for example, it is much easier 
to learn a good driving policy if the inputs are represented in the frame of reference 
of the car rather than that of the track 0707117141 . A somewhat more comprehensive 
discussion on ways of representing low-dimensional inputs to neural networks can 
be found in 05671 . 

In recent years, several groups of researchers have focused on learning poli¬ 
cies that use full visual feeds as inputs. For example, Koutnik et al. evolved neural 
networks to play The Open Racing Car Simulator (TORCS) from high-resolution 
video 13531, Kempka et al. used the pixels of the screen as input to a deep Q network 
that was trained to play a version of DOOM (GT Interactive, 1993) 03331 . and Mnih 
et al. trained deep networks to play Atari 2600 games using Q-learning 04641 . Using 
the raw pixel inputs is often motivated by giving the AI the same conditions as a 
human would have, and thus achieving a level playing field between human and AI. 
Another motivation is that if you want to use your algorithm to play a game “out of 
the box”, without any API or additional engineering to expose the internal state of 
the game, you will likely have to resort to using the raw visual feed. However, in 
cases where you have access to the source code of the game or a useful API—as you 
would almost always have when developing AI for a new game—there is no reason 
to not utilize the “digested” game state in whatever form makes the task of the AI 
algorithm easiest. Whether or not to present information that the human player does 
not have access to, i.e., “cheating”, is a separate question. 


3.2.2.2 Is There a Forward Model? 

A very important factor when designing an AI to play a game is whether there is a 
simulator of the game, a so-called forward model, available. A forward model is a 
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model which, given a state s and an action a, reaches the same state s' as the real 
game would reach if it it was given a at s. In other words, it is a way of playing the 
game in simulation, so that consequences of multiple actions can be explored before 
actually taking some of those actions in the real game. Having a forward model of 
the game is necessary in order to be able to use any tree search-based approaches to 
playing a game, as those approaches depend on simulating the outcome of multiple 
actions. 

A very desirable property of a forward model, in addition to that it exists, is that 
it is fast. In order to be able to use a tree search algorithm effectively for control in 
a real-time game, one would generally need to be able to simulate gameplay at least 
a thousand times faster than real-time, preferably tens or hundreds of thousands of 
times faster. 

It is very easy to construet a forward model for classic board games such as 
Chess and Go, as the game state is simply the board state and the rules are very easy 
to encode. For many video games, constructing a forward model can be done by 
simply copying (or otherwise reusing) the same code as is used for controlling the 
game itself, but without waiting for user input or displaying graphies, and without 
performing all the calculations involved with graphies rendering. For some video 
games—notably games that were originally implemented for much older hardware, 
such as classic arcade games that were implemented on 8-bit or 16-bit processors— 
forward models can be made much faster than real-time, as the core game loop is not 
that computationally complex. (It might also be possible to do this with some mod- 
ern games, by running them inside emulators that can replace the graphies routines 
with dummy code.) 

For many games, however, it is impossible or at least very hard to obtain a fast 
forward model. For most commercial games, the source code is not available, unless 
you are working at the company that develops the game. Even if the source code is 
available, current Software engineering practices in the game industry make it very 
hard to extract forward models from game code, as the core control loops are often 
closely tied up with user interface management, rendering, animation and some- 
times network code. A change in Software engineering practices to separate the core 
game loop more cleanly from various input/output functions so that forward models 
could more easily be built would be one of the most important enablers of advanced 
AI methods in video games. However, in some cases the computational complexity 
of the core game loop might stili be so high that any forward models built on the 
core game code would be too slow to be usable. In some of such cases, it might be 
practical to build and/or learn a simplified or approximate forward model, where 
the state resulting from a series of actions taken in the forward model is not guaran- 
teed to be identical to the state resulting from the same series of actions in the actual 
game. Whether an approximate forward model is acceptable or not depends on the 
particular use case and motivation for the AI implementation. Note that a somewhat 
less than accurate forward model might stili be desirable. For example, when there 
is significant hidden information or stochasticity the AI designer might not want to 
provide the AI agent with an oracle that makes the hidden information observable 
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and telis the agent which random actions will happen. Taking such a design decision 
might lead to unreasonably good performance and the appearance of cheating. 

When a forward model cannot be produced, tree search algorithms cannot be 
applied. It is stili possible to manually construet agents, and also to learn agents 
through supervised leaming or some form of reinforcement learning, such as 
temporal difference leaming or evolutionary reinforcement learning. However, note 
that while the reinforcement learning approaches in general do not need a complete 
forward model in the sense that the results of taking any action in any state can be 
predicted, they stili need a way to run the game faster than real-time. If the game 
cannot be sped up signihcantly beyond the pace at which it is naturally played, it is 
going to take the algorithm a very long time to learn to play. 


3.2.2.3 Do You Have Time to Train? 

A crude but useful distinction in artihcial intelligence is between algorithms that try 
to decide what to do in a given situation by examining possible actions and future 
States—^roughly, tree search algorithms of various kinds—and algorithms that learn 
a model (such as a policy) over time—i.e., machine learning. The same distinction 
exists within AI for playing games. There are algorithms developed that do not 
need to learn anything about the game, but do need a forward model (tree search); 
there are algorithms that do not need a forward model, but instead learn a policy 
as a mapping from state(s) to action (model-free reinforcement learning); and there 
are algorithms that require both a forward model and training time (model-based 
reinforcement learning and tree search with adaptive hyperparameters). 

What type of algorithm you will want to use depends largely on your motivation 
for using AI to play games. If you are using the game as a testbed for your AI algo¬ 
rithm, your choice will be dictated by the type of algorithm you are testing. If you 
are using the AI to enable player experience in a game that you develop—for exam- 
ple, in a non-player role—then you will probably not want the AI to perform any 
learning while the game is being played, as this risks interfering with the gameplay 
as designed by the designer. In other cases you are looking for an algorithm that can 
play some range of games well, and do not have time to retrain the agent for each 
game. 


3.2.2.4 How Many Games Are You Playing? 

An aim the AI designer might wish to achieve is that of general game playing. 
Here, we are not looking for a policy for a single game, we are looking for a more 
generic agent that can play any game that it is presented with—or at least any game 
from within a particular distribution or genre, and which adheres to a given interface. 
General game playing is typically motivated by a desire to use games to progress 
towards artihcial general intelligence, i.e., developing AI that is not only good at one 
thing but at many different things 059811679117441 . The idea is to avoid overhtting 
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(manually or automatically) a given game and come up with agents that generalize 
well to many different games; it is a common phenomenon that when developing 
agents for a particular game, for example, for a game-based AI competition, many 
special-purpose Solutions are devised that do not transfer well to other games GoD. 

For this reason, it is common to evaluate general game playing agents on un- 
seen games, i.e., games on which they have not been trained and which the design- 
ers of the agent were not aware of when developing the agent. There are several 
frameworks for general game playing, including the General Game Playing Com¬ 
petition 02231 . the General Video Game AI Competition 0528115271 and the Arcade 
Leaming Environment ®1. These will be discussed later in the chapter. 

General video game playing, where AI is developed to play for performance in 
the player role across many games (ideally ali games), can be seen as diametrically 
opposed to the typical use of AI for playing games in commercial game develop- 
ment, where the AI is playing for experience in a non-player role, and is carefully 
tuned to a particular game. However, the development of AI methods for general 
game playing certainly benefits commercial game AI in the end. And even when 
developing game AI as part of game development, it is good engineering practice to 
develop methods that are reusable to some extent. 


3.3 How Can AI Play Games? 

In Chapter]^ we reviewed a number of important AI methods. Most of these meth¬ 
ods can be used to play games in one way or another. This section will focus on the 
core AI methods, and for each family of algorithms it will go through on how they 
can be used to play games. 


3.3.1 Planning-Based Approaches 

Algorithms that select actions through planning a set of future actions in a state 
space are generally applicable to games, and do not in general require any training 
time. They do require a fast forward model if searching in the game’s state space, 
but not if simply using them for searching in the physical space (path-planning). 
Tree search algorithms are widely used to play games, either on their own or in 
supporting roles in game-playing agent architectures. 


3.3.1.1 Classic Tree Search 

Classic tree search methods, which feature little or no randomness, have been used 
in game-playing roles since the very beginning of research on AI and games. As 
mentioned in the introduction of this book the Minimax algorithm and a-j3 pruning 
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were originally invented in order to play classic board games such as Chess and 
Checkers 07251 . While the basic concepts of adversarial tree search have not really 
changed since then, there have been numerous tweaks to existing algorithms and 
some new algorithms. In general, classic tree search methods can easily be applied 
in games that feature full observability, a low branching factor and a fast forward 
model. Theoretically they can solve any deterministic game that features full ob¬ 
servability for the player (see the red cube in Fig. 3.2 1 ; in practice, they stili fail in 
games containing large state spaces. 

Best-first search, in particular a myriad variations of the A* algorithm, is very 
commonly used for path-planning in modern video games. When an NPC in a 
modem 3D FPS or RPG decides how to get from point A to point B, this is typi- 
cally done using some version of A*. In such instances, search is usually done in 
(in-game) physical space rather than state space, so no forward model is necessary. 
As the space is pseudo-continuous, search is usually done on the nodes of a mesh 
or lattice overlaid on the area to be traversed. Note that best-first search is only used 
for navigation and not for the full decision-making of the agent; methods such as 
behavior trees or finite-state machines (which are usually hand-authored) are used to 
determine where to go, whereas A* (or some variation of it) is used to determine how 
to get there. Indeed, in games where player input is by pointing to and clicking at 
positions to go—think of an overhead brawler like Diablo (Blizzard Entertainment, 
1996) or an RTS like StarCraft (Blizzard Entertainment, 1998)—the execution of 
the player’s order usually involves a path-planning algorithm as well. Recent addi- 
tions to the family of best-first algorithms include jump point search (JPS), which 
can improve performance by orders of magnitude compared to Standard A* under 
the right circumstances 066211 . Hierarchical pathfinding is its own little research area 
based on the idea of dividing up an area into subareas and using separate algorithms 
for deciding how to go between and within the areas. Choosing a path-planning al¬ 
gorithm for a modem video game is usually a matter of choosing the algorithm that 
Works best given the shape of the environments the NPCs (or PCs) are traversing, 
the particular way a grid or movement graph is overlaid on top of this space, and 
the demands of the animation algorithm. Generally, one size does not fit ali; some 
textbooks devoted to industry-oriented game AI discuss this in more depth 114611 . 

Beyond path-planning, best-first algorithms such as A* can be used for control- 
ling ali aspects of NPC behavior. The key to doing this is to search in the state 
space of the game, not just the physical space. (Obviously, this requires a fast for¬ 
ward model.) To take an example, the winner of the 2009 Mario AI Competition 
was entirely based on A* search in state space B705II . This competition tasked com- 
petitors with developing agents that could play a Java-based done of the classic 
platform game Super Mario Bros (Nintendo, 1985)—it later evolved into a multi- 
track competition II322I . While a forward model was not supplied with the original 
competition Software, the winner of the competition, Robin Baumgarten, created 
one by adapting parts of the core game code. He then built an A* agent which at 
any point simply tried to get to the right edge of the screen. (An illustration of the 
agent can be seen in Eigs. |3.3| and 2.5 ) This worked extremely well; the resulting 
agent played seemingly optimally, and managed to get to the end of all levels in- 
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lefl, jump, speed 



Fig. 3.3 An illustration of the key steps of A* search for playing Super Mario Bros (Nintendo, 
1985). The agent considers a maximum of nine possible actions at each frame of the game, as a 
resuit of combining the jump and speed buttons with moving right or left (top figure). Then the 
agent picks the action with the highest heuristic value (middle figure). Finally, the Mario agent 
takes the action (i.e., right, jump, speed in this example), moves to a new state and evaluates the 
new action space in this new state (bottom figure). More details about the A* agent that won the 
Mario AI competition in 2009 can be found in f705l . 
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cluded in the competition Software. A video showing the agent navigating one of 
the levels gathered more than a million views on YouTube0the appeal of seeing the 
agent playing the game is partly the extreme skill of the agent in navigating among 
multiple enemies. 

It is important to note that the success of this agent is due to several factors. One 
is that the levels are fairly linear; in a later edition of the competition, levels with 
dead ends which required back-tracking were introduced, which defeated the pure 
A* agent II322I . Two other factors are that Super Mario Bros (Nintendo, 1985) is 
deterministic and has locally perfect information (at any instant the information in 
the current screen is completely known) and of course that a good forward model is 
available; if the A* would not have used a complete model of the game including 
the movement of enemies, it would have been impossible to plan paths around these 
enemies. 


3.3.1.2 Stochastic Tree Search 

The MCTS algorithm burst onto the scene of Go research in 2006 014111771 . and her- 
alded a quantum leap in performance of Go-playing AI. Classic adversarial search 
had performed poorly on Go, partly because the branching factor is too high (about 
an order of magnitude higher than Chess) and partly because the nature of Go makes 
it very hard to algorithmically judge the value of a board state. MCTS partly over- 
comes these challenges by building imbalanced trees where not all moves need to 
be explored to the same depth (reduces effective branching factor) and by doing 
random rollouts until the end of the game (reduces the need for a state evaluation 
function). The AlphaGo 06291 Software which beat two of the best human Go play- 
ers in the world in 2016 and 2017, is built around the MCTS algorithm. 

The success of MCTS on Go has led researchers and practitioners to explore its 
use for playing a wide variety of other games, including trading card games 07461 . 
platform games 02941 . real-time strategy games 0311116451 . racing games 02031 and 
so on. Of course, these games differ in many ways from Go. While Go is a de¬ 
terministic perfect information game, a real-time strategy game such as StarCraft 
(Blizzard Entertainment, 1998), a trading card game such as Magic: The Gathering, 
or any Poker variant feature both hidden information and stochasticity. Methods 
such as Information Set Monte Carlo tree search are one way of dealing with these 
issues, but impose computational costs of their own 01461 . 

Another problem is that in games with fine time granularity, it might take a pro- 
hibitively long time for a rollout to reach a terminal state (a loss or a win); in many 
video games it is possible to take an arbitrary number of actions without winning 
or losing the game, or even doing something that materially affects the outcome 
of the game. For example, in Super Mario Bros (Nintendo, 1985), most randomly 
generated action sequences would not see Mario escaping the original screen, but 
basically pacing back and forth until time runs out, thousands of time steps later. 


^ https://www.youtube.com/watch?v=DlkMs4ZHHr8 
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One response to this problem is to only roll out a ceitain number of actions, and if a 
terminal state is not encountered use a state evaluation function irTTll . Other ideas in¬ 
clude pruning the action selection so as to make the algorithm search deeper 02941 . 
Given the large number of modifications to all components of the MCTS algorithm, 
it makes more sense to think of MCTS as a general algorithmic framework rather 
than as a single algorithm. 

Many games could be played either through MCTS, uninformed search (such as 
breadth-first search) or informed search (such as A*). Deciding which method to 
use is not always straightforward, but luckily these methods are relatively simple 
to implement and test. Generally speaking, Minimax can only be used for (two- 
player) adversarial games whereas other forms of uninformed search are best used 
for single-player games. Best-first search requires some kind of estimate of a dis- 
tance to a goal state, but this does not need to be a physical position or the end goal 
of the game. Varieties of MCTS can be used for both single-player and two-player 
games, and often outperform uninformed search when branching factors are high. 


3.3.1.3 Evolutionary Planning 

Interestingly, decision making through planning does not need to be built on tree 
search. Alternatively, one can use optimization algorithms for planning. The basic 
idea is that instead of searching for a sequence of actions starting from an initial 
point, you can optimize the whole action sequence. In other words, you are search¬ 
ing the space of complete action sequences for those that have maximum utility. 
Evaluating the utility of a given action sequence is done by simply taking all the 
actions in the sequence in simulation, and observing the value of the state reached 
after taking all those actions. 

The appeal of this idea is that an optimization algorithm might search the plan 
space in a very different manner compared to a tree search algorithm: all tree search 
algorithms start from the root of the tree (the origin state) and build a tree from that 
point. Evolutionary algorithms instead regard the plan as simply a sequence, and 
can perform mutations or crossover at any point in the string. This could help in 
guiding the search at different areas of the plan space that a tree search algorithm 
would explore for the same problem. 

While many different optimization algorithms could be used, the few studies on 
optimization-based planning in games that can be found in the literature use evo¬ 
lutionary algorithms. Perez et al. proposed using evolutionary planning for single- 
player action games, calling this approach “rolling horizon evolution’ ’ 15261. In the 
particular implementation for the Physical Traveling Salesman Problem (a hybrid 
between the classic TSP problem and a racing game), an evolutionary algorithm 
was used to generate a plan every time step. The plan was represented as a sequence 
of 10-20 actions, and a Standard evolutionary algorithm was used to search for pians. 
After a plan was found, the first step of the plan was executed, just as would be the 
case with a tree search algorithm. Agents based on evolutionary planning generally 
perform competitively in the General Video Game AI Competition 05281 . 
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Evolutionary planning is particularly promising as a technique for handling very 
large branching factors, as we have seen that games with multiple independent units 
(such as strategy games) can have. Justesen et al. 13091 applied evolutionary com- 
putation to select actions in the turn-based strategy game Hero Academy (Robot 
Entertainment, 2012), calling this approach “online evolution”. Given the number 
of units the player Controls and the number of actions available per unit, the branch¬ 
ing factor is about one million; therefore, only a single turn ahead was planned. 
Evolutionary planning was shown to outperform Monte Carlo tree search by a wide 
margin in that game. Wang et al. II745L and Justesen and Risi 13101 later applied 
variants of this technique to StarCraft (Blizzard Entertainment, 1998) tactics. Given 
the continuous-space nature of the game, the branching factor would be extreme if 
every possible movement direction for every unit was considered as a separate ac- 
tion. What was evolved was therefore not a sequence of actions, but rather which 
of several simple Scripts (tactics) each unit would use in a given time step (this idea 
was borrowed from Churchill and Buro, who combined a “portfolio” of Scripts with 
simple tree search 11231 '). Wang et al. 17451 showed that evolutionary planning per- 
formed better than several varieties of tree search algorithms in this simple StarCraft 
(Blizzard Entertainment, 1998) scenario. 

Evolutionary planning in games is a recent invention, and there are only a limited 
number of studies on this technique so far. It is not well understood under what 
conditions this technique performs well, or even really why it performs so well 
when it does. A major unsolved problem is how to perform evolutionary adversarial 
planning 15861 : whereas planning based on tree search works in the presence of an 
adversary (for example, see the minimax algorithm), it is not ciear how to integrate 
this into a genotype. Perhaps through competitive coevolution of actions taken by 
different players? There is, in other words, plenty of scope for further research in 
this area. 


3.3.1.4 Planning with Symbolic Representations 

While planning on the level of in-game actions requires a fast forward model, there 
are other ways of using planning in games. In particular, one can plan in an abstract 
representation of the game’s state space. The field of automated planning has studied 
planning on the level of symbolic representations for decades 02281 . Typically, a 
language based on first-order logic is used to represent events, States and actions, and 
tree search methods are applied to find paths from the current state to an end state. 
This style of planning originated with the STRIPS representation used in Shakey, 
the world’s first digital mobile robot 049 40 : symbolic planning has since been used 
extensively in numerous domains. 

The horror-themed first-person shooter F.E.A.R. (Sierra Entertainment, 2005) be- 
came famous within the AI community for its use of planning to coordinate NPC 
behavior. The game’s AI also received nice reviews in the gaming press, partiy be- 
cause the player is able to hear the NPCs communicate with each other about their 
plan of attack, heightening immersion. In F.E.A.R. (Sierra Entertainment, 2005), 
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a STRIPS-like representation is used to plan which NPCs perform which actions 
(flank, take cover, suppress, fire, etc.) in order to defeat the player character. The 
representation is on the level of individual rooms, where movement between one 
room and the next is usually a single action IMl. Using this high-level representa¬ 
tion, it is possible to plan much further ahead than would be possible when planning 
on the scale of individual game actions. Such a representation, however, requires 
manually dehning States and actions. 


3.3.2 Reinforcement Learning 

As discussed in Chapter]^ a reinforcement learning algorithm is any algorithm that 
solves a reinforcement learning problem. This includes algorithms from the tempo- 
ral difference or approximate dynamic programming family (for simplicity, we will 
refer to such algorithms as classic reinforcement learning methods), applications of 
evolutionary algorithms to reinforcement learning such as neuroevolution and ge- 
netic programming, and other methods. In this section, we will discuss both classic 
methods (including those that involve deep neural networks) and evolutionary meth¬ 
ods as they are applied for playing games. Another way of describing the difference 
between these methods is the difference between ontogenetic (which learns during 
“lifetimes”) and phylogenetic (which learns between “lifetimes”) methods 07151 . 

Reinforcement learning algorithms are applicable to games when there is learn¬ 
ing time available. Usually this means plenty of training time; most reinforcement 
learning methods will need to play a game thousands, or perhaps even millions, of 
times in order to play it well. Therefore, it is very useful to have a way of playing 
the game much faster than real-time (or a very large server farm). Some reinforce¬ 
ment learning algorithms, but not all, also require a forward model. Once it has been 
trained, a reinforcement-learned policy can usually be executed very fast. 

It is important to note that the planning-based methods (described in the previ- 
ous section) for playing games cannot be directly compared with the reinforcement 
learning methods described in this section. They solve different problems: plan¬ 
ning requires a forward model and signihcant time at each time step; reinforcement 
learning instead needs learning time and may or may not need a forward model. 


3.3.2.1 Classic and Deep Reinforcement Learning 

As already mentioned in the introduction of this book, classic reinforcement learn¬ 
ing methods were used with games early on, in some cases with considerable suc- 
cess. Arthur Samuel devised an algorithm—which can be said to be the first clas¬ 
sic reinforcement learning algorithm—in 1959 to create a self-leaming Checkers 
player. Despite the very limited computational resources of the day, the algorithm 
learned to play well enough to beat its creator 15911 . Another success for classic 
reinforcement learning in game-playing came a few decades later, when Gerald 
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Tesauro used the modern formulation of temporal difference learning to teach a 
simple neural network to play Backgammon, named TD-gammon; it learned to play 
surprisingly well, after starting with no information and simply playing against it- 
self 16891 (TD-gammon is covered in more detail in Chapter]^. This success moti- 
vated much interest in reinforcement learning during the 1990s and early 2000s. 

However, progress was limited by the lack of good function approximators for 
the value function (e.g., the Q function). While algorithms such as Q-learning will 
provably converge to the optimal policy under the right conditions, the right condi- 
tions are in fact very restrictive. In particular, they include ali state values or {state, 
action} values that are stored separately, for example, in a table. However, for most 
interesting games there are far too many possible States for this to be feasible— 
almost any video game has at least billions of States. This means that the table would 
be too big to fit in memory, and that most States would never be visited during learn¬ 
ing. It is clearly necessary to use a compressed representation of the value function 
that occupies less memory and also does not require every state to be visited in or- 
der to calculate its value. It can, instead, calculate it based on neighboring States that 
have been visited. In other words, what is needed is a function approximator, such 
as a neural network. 

However, using neural networks together with temporal difference learning tums 
out to be non-trivial. It is very easy to encounter “catastrophic forgetting”, where 
sophisticated strategies are unlearned in favor of degenerate strategies (such as al- 
ways taking the same action). The reasons for this are complex and go beyond the 
discussion in this chapter. However, to intuitively understand one of the mecha- 
nisms involved, consider what would usually happen for a reinforcement learning 
agent playing a game. Rewards are very sparse, and the agent will typically see long 
stretches of no reward, or negative reward. When the same reward is encountered 
for a long time, the backpropagation algorithm will be trained only with the target 
value of that reward. In terms of supervised learning, this is akin to training for a 
long term on a single training example. The likely outcome is that the network leams 
to only output that target value, regardless of the input. More details on the method 
of approximating a value function using an ANN can be found in Section p.8.2.3 

A major success in the use of reinforcement learning of the temporal difference 
variety together with function approximators came in 2015, when Google Deep- 
Mind published a paper where they managed to train deep neural networks to play 
a number of different games from the classic Atari 2600 games console 14641 . Each 
network was trained to play a single game, with the inputs being the raw pixels of 
the game’s visuals, together with the score, and the output being the controller’s 
directions and fire button. The method used to train the deep networks is deep Q 
networks, which is essentially Standard Q-learning applied to neural networks with 
many layers (some of the layers used in the architecture were convolutional). Cru- 
cially, they managed to overcome the problems associated with using temporal dif¬ 
ference techniques together with neural networks by a method called experience 
replay. Here, short sequences of gameplay are stored, and replayed to the network 
in varying order, in order to break up the long chains of similar States and reward. 
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This can be seen as akin to batch-based training in supervised leaming, using small 
batches. 


3.3.2.2 Evolutionary Reinforcement Leaming 

The other main family of reinforcement leaming methods is evolutionary methods. 
In particular, using evolutionary algorithms to evolve the weights and/or topology of 
neural networks (neuroevolution) or programs, typically structured as expression 
trees (genetic programming). The fitness evaluation consists in using the neural 
network or program to play the game, and using the resuit (e.g., score) as a fitness 
function. 

This basic idea has been around for a long time, but was surprisingly under- 
explored for a long time. John Koza, a prominent researcher within genetic pro¬ 
gramming, used an example of evolving programs for playing Pac-Man in his 1992 
book II356L A couple of years later, Pollack and Blair showed that evolutionary com- 
putation can be used to train backgammon players using the same setup as Tesauro 
used in his experiments with TD leaming, and with similar results II537L Outside 
of games, a community of researchers was forming in the 1990s exploring the idea 
of using evolutionary computation to leam control strategies for small robots; this 
field came to be called evolutionary robotics 149611 . Training robots to solve sim¬ 
ple tasks of e.g., navigation, obstacle-avoidance and situational leaming has very 
much in common with training NPCs to play games, in particular two-dimensional 
arcade-like games 056711767117661 . 

Starting around 2005, a number of advances were made in applying neuroevolu¬ 
tion to playing different types of video games. This includes applications to car rac- 
ing 07071170911392113531 . first-person shooters 05181 . strategy games ||79|, real-time 
strategy games 06541 and classic arcade games such as Pac-Man 0766114031 . Perhaps 
the main takeaway from this work is that neuroevolution is extremely versatile, and 
can be applied to a wide range of games, usually in several different ways for each 
game. For example, for a simple car racing game it was shown that evolving neural 
networks that acted as state evaluators, even in combination with a simple one-step 
lookahead search, substantially outperformed evolving neural networks working as 
action evaluators (Q functions) 04081 . Input representation matters too; as discussed 
in Section [3.2.2.1| egocentric inputs are generally strongly preferred, and there are 
additional considerations for individual game types, such as how to represent mul¬ 
tiple adversaries |654| . 

Neuroevolution has seen great success in leaming policies in cases where the 
state can be represented using relatively few dimensions (say, fewer than 50 units 
in the neural network’s input layer), and is often easier to tune and get working 
than classic reinforcement leaming algorithms of the temporal difference variety. 
However, neuroevolution seems to have problems scaling up to problems with very 
large input spaces that require large and deep neural networks, such as those using 
high-dimensional pixel inputs. The likely reason for this is that stochastic search in 
weight space suffers from the curse of dimensionality in a way that gradient descent 
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search (such as backpropagation) does not. Cuirently, almost ali successful exam- 
ples of learning directly from high-dimensional pixel inputs use deep Q-leaming 
or similar methods, though there are approaches that combine neuroevolution with 
unsupervised learning, so that controllers are learned that use a compressed repre- 
sentation of the visual feed as input 0531 . 

For more details on the general method of neuroevolution and pointers to the 
literature the reader is referred to Section 2.8.1 and to the recent survey paper on 
neuroevolution in games 15671 . 


3.3.3 Supervised Learning 

Games can also be played using supervised learning. Or rather, policies or con¬ 
trollers for playing games can be learned through supervised learning. The basic 
idea here is to record traces of human players playing a game and train some func- 
tion approximator to behave like the human player. The traces are stored as lists of 
tuples <features, target> where the features represent the game state (or an observa- 
tion of it that would be available to the agent) and the target is the action the human 
took in that state. Once the function approximator is adequately trained, the game 
can be played—in the style of the human(s) it was trained on—^by simply taking 
whatever action the trained function approximator returns when presented with the 
current game state. Alternatively, instead of learning to predict what action to take, 
one can also learn to predict the value of States, and use the trained function ap¬ 
proximator in conjunction with a search algorithm to play the game. Further details 
about the potential supervised algorithms that can be used in games are described in 
Chapter]^ 


3.3.4 Chimeric Game Players 

While planning, reinforcement learning and supervised learning are fundamentally 
different approaches to playing games, solving the game-playing problem under 
different constraints, which does not mean that they cannot be combined. In fact, 
there are many examples of successful hybrids or chimeras of approaches from 
these three broad classes. One example is dynamic scripting 116501 which can be 
viewed as a form of a learning classifier system 13631 in that it involves a rule- 
based (here called script-based) representation coupled with reinforcement learning. 
Dynamic scripting adjusts the importance of Scripts via reinforcement learning at 
runtime and is based on the current game state and immediate rewards obtained. 
Dynamic scripting has seen several applications in games including fighting games 
gni and real-time strategy games II409111541 . The approach has been used mainly 
for AI that adapts to the skilis of the player, thereby aiming at the experience of the 
player and not necessarily at wining the game. 
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Another good example is AlphaGo. This extremely high-performing Go-playing 
agent actually combines planning through search, reinforcement learning and super- 
vised learning 16291 . At the core of the agent is a Monte Carlo tree search algorithm 
which searches in the state space (planning). Rollouts, however, are combined with 
evaluations from a neural network which estimates the value of States, and node 
selection is informed by a position estimation network. Both the state network and 
position network are initially trained on databases of games between grandmasters 
(supervised learning), and later on fnrther trained by self-play (reinforcement learn¬ 
ing). 


3.4 Which Games Can AI Play? 

Different games pose different challenges for AI playing, in the same way they pose 
different challenges for human playing. Not only are there differences in what kind 
of access the AI player has to the games, but also between different game types: 
a policy for playing Chess is unlikely to be proficient at playing the games in the 
Grand Theft Auto (Rockstar Games, 1997-2013) series. This section is organized 
according to game genres, and for each game genre it discusses what the particu- 
lar cognitive, perceptual, behavioral and kinesthetic challenges games of that genre 
generally pose, and then gives an overview of how AI methods have been used to 
play that particular game genre. It also includes several extended examples, giving 
some detail about particular implementations. Once again it is important to note that 
the list is not inclusive of all possible game genres AI can play as a player or non- 
player character; the selection is made on the basis of popularity of game genres and 
the available published work on AI for playing games in each genre. 


3.4.1 Board Games 

As discussed in the introduction of this book, the earliest work on AI for playing 
games was done in classic board games, and for a long time that was the only way in 
which AI was applied to playing games. In particular, Chess was so commonly used 
for AI research that it was called the “drosophila of artificial intelligence’ ’ dll, al- 
luding to the use of the common fruit fly as a model organism in genetics research. 
The reasons for this seem to have been that board games were simple to implement, 
indeed possible at all to implement on the limited computer hardware available in 
the early days of AI research, and that these games were seen to require something 
akin to “pure thought”. What a game such as Chess or Go does require is adversar- 
ial planning. Classic board games typically place no demand at all on perception, 
reactions, motor skills or estimation of continuous movements, meaning that their 
skill demands are particularly narrow, especially compared to most video games. 
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Most board games have very simple discrete state representations and determin- 
istic forward models—the full state of the game can often be represented in less 
than 100 bytes, and calculating the next state of the game is as simple as applying 
a small set of rules—and reasonably small branching factors. This makes it very 
easy to apply tree search, and almost ali successful board game-playing agents use 
some kind of tree search algorithm. As discussed in the sections on tree search in 
Chapter]^ the Minimax algorithm was originally invented in the context of playing 
Chess. Decades of research concentrated on playing Chess (with a lesser amount of 
research on Checkers and Go), with specific conferences dedicated to this kind of 
research, led to a number of algorithmic advances that improved the performance of 
the Minimax algorithm on some particular board game. Many of these have limited 
applicability outside of the particular game they were developed on, and it would 
take us too far to go into these algorithmic variations here. For an overview of ad¬ 
vances in Chess playing, the reader is referred to ll98l . 

In Checkers the reigning human Champion was beaten by the Chinook Software 
in 1994 05941 and the game was solved in 2007, meaning that the optimal set of 
moves for both players was found (it is a draw if you play optimally) I593II : in 
Chess, Garry Kasparov was famously beaten by Deep Blue in 1997 ll98ll . It took untii 
2016 for Google DeepMind to beat a human Go Champion with their AlphaGo Soft¬ 
ware 06291 . mainly because of the algorithmic advances necessary. Whereas Chess 
and Checkers can be played effectively with some variation of the Minimax algo¬ 
rithm combined with relatively shallow state evaluations, the larger branching factor 
of Go necessitated and spurred the development of MCTS ITTlI . 

While MCTS can be utilized to play board games without a state evaluation func- 
tion, supplementing that algorithm with state and action evaluation functions can 
massively enhance the performance, as seen in the case of AlphaGo, which uses 
deep neural networks for state and action evaluation. On the other hand, when using 
some version of Minimax it is necessary to use state evaluation functions as all inter- 
esting board games (more complex than Tic-Tac-Toe) have too large state spaces to 
be searched untii the end of the game in acceptable time. These evaluation functions 
can be manually constructed, but in general it is a very good idea to use some form of 
learning algorithm to leam their parameters (even though the structure of the func- 
tion is specified by the algorithm designer). As discussed above, Samuel was the 
first to use a form of reinforcement learning to leam a state evaluation function in a 
board game (or any kind of game) 05911 . and Tesauro later used TD learning to very 
good effect in Backgammon 06891 . Evolutionary computation can also be used to 
leam evaluation functions, for example, Pollack showed that co-evolution could per- 
form well on Backgammon using a very similar setup to Tesauro 15321 . Noteworthy 
examples of strong board game players based on evolved evaluation functions are 
Blondie24 02071 and Blondie25 02080 . a Checkers- and a Chess-playing program re- 
spectively. The evaluation functions were based on five-layered deep convolutional 
networks, and Blondie25 in particular performed well against very strong Chess 
players. 

While classic board games such as Go and Chess have existed for hundreds or 
even thousands of years, the past few decades have seen a rejuvenation of board 
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game design. Many of the more recently designed board games mix up the formula 
of classic board games with design thinking from other game genres. A good exam- 
ple is Ticket to Ride (Days of Wonder, 2004), which is a board game that includes 
elements of card games, such as variable numbers of players, hidden Information 
and stochasticity (in the draw of the cards). For these reasons, it is hard to construet 
well-performing AI players based on Standard tree-search methods; the best known 
agents include substantial domain knowledge yet perform poorly compared to hu- 
man players 11601 . Creating generic well-performing agents for this type of game is 
an interesting research challenge. 

Given the simplicity of using tree search for board game playing, it is not surpris- 
ing that every approach we have discussed so far builds on one tree search algorithm 
or another. However, it is possible to play board games without forward models— 
usually with results that are “interesting” rather than good. For example, Stanley and 
Miikkulainen developed a “roving eye” approach to playing Go, where an evolved 
neural network self-directedly scans the Go board and decides where to place the 
next piece 165611 . Relatedly, it is reportedly possible for the positi on evaluation net¬ 
work of AlphaGo to play a high-quality game of Go on its own, though it naturally 
plays stronger if combined with search. 


3.4.2 Card Games 

Card games are games centered on one or several decks of cards; these might or 
might not be the Standard 52-card French deck which is commonly used in classic 
card games. Most card games involve players possessing different cards that change 
ownership between players, or between players and the deck, or other positions on 
the table. Another important element of most card games is that some cards are 
visible to the player who possesses them but not to other players. Therefore, almost 
all card games feature a large degree of hidden information. In fact, card games 
are perhaps the type of games where hidden information most dominates gameplay. 

For example, take the classic card game Poker, which is currently very popular 
in its Texas hold ’em variety. The rules are relatively simple; the player which at 
the end of a few rounds holds the best cards (the “best hand”) wins. Between the 
rounds, the player can exchange a number of cards for fresh cards drawn from the 
deck. If there were perfect information, i.e., all players could see each others’ hands, 
this would be an uninteresting game that could be played according to a lookup 
table. What makes Texas hold ’em—and similar Poker variants—challenging and 
interesting is that each player does not know what cards the other players have. 
The cognitive challenges of playing these games involve acting in the absence of 
information, which implies inferring the true game state from incomplete evidence, 
and potentially affecting other players’ perception of the true game state. In other 
words, a game of Poker is largely about guessing and bluffing. 

A key advance in playing Poker and similar games is the Counterfactual Regret 
Minimization (CFR) algorithm II797I . In CFR, algorithms leam by self-play in a 
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similar fashion to how temporal difference learning and other reinforcement learn- 
ing algorithms have been used in perfect information games like Backgammon and 
Checkers. The basic principle is that after every action, when some hidden state has 
been revealed, it computes the alternative reward of all other actions that could have 
been taken, given the newly revealed information. The difference between the re¬ 
ward attained from the action that was actually taken and the best action that could 
have been taken is called the regret. The policy is then adjusted so as to minimize 
the regret. This is done iteratively, slowly converging on a policy that is optimal in 
the sense that it loses as little as possible over a large number of games. However, 
for games as complex as Texas hold ’em, simplifications have to be done in order to 
use the CFR algorithm in practice. 

DeepStack is a recent agent and algorithm that has reached world-class perfor- 
mances in Texas hold ’em 14671 . Like CFR, DeepStack uses self-play and recursive 
reasoning to leam a policy. However, it does not compute an explicit strategy be- 
fore play. Instead, it uses tree search in combination with a state value approxima- 
tion to select actions at each tum. In this sense, it is more like the heuristic search 
of AlphaGo (but in a setting with plenty of imperfect information) than like the 
reinforcement-learned policy of TD-gammon. 

Another, much more recent, card game which is drawing increasing interest from 


the research community is Hearthstone (Blizzard Entertainment, 2014); see Fig. 3.4 


This is a collectible card game in the tradition of Magic: The Gathering, but with 
somewhat simplerrules and only played on computers. A game of Hearthstone takes 
place between two players, with each player having a deck of 30 cards. Each card 
represents either a creature or a spell. Each player has a handful of cards (< 7) in 
hand (invisible for the other player), and at each turn draws a new card and has 
the option of playing one or more cards. Creature cards convert to creatures that 
are placed on the player’s side of the table (visible for both players), and creatures 
can be used to attack the opponent’s creatures or player character. Spells have a 
multiplicity of different effects. The hundreds of different cards in the game, the 
possibility of choosing to take multiple actions each tum, the long time taken to 
play a game (20 to 30 turns is common), the presence of stochasticity and of course 
the hidden information (mainly what cards are in the opponent’s hand) conspire 
to make Hearthstone (Blizzard Entertainment, 2014) a hard game to play for both 
humans and machines. 

Perhaps the simplest approach to playing Hearthstone (Blizzard Entertainment, 
2014) is to simply ignore the hidden information and play each turn in a greedy 
fashion, i.e., search the space of possible actions within a single turn and choose 
the one that optimizes some criterion such as health point advantage at the end of 
that turn, given the available information only. Agents that implement such greedy 
policies are included with some open source Hearthstone simulators, such as Meta- 
sfone0 Standard tree search algorithms such as Minimax or MCTS are generally 
ineffective here (as in Poker) because of the very high degree of hidden informa¬ 
tion. One approach to constructing high-performing agents is instead to hand-code 


^ http://www.demilich.net/ 
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Fig. 3.4 A screenshot from Hearthstone (Blizzard Entertainment, 2014) displaying a number of 
different creature or spell cards available in the game. Image obtained from Wikipedia (fair use). 


domain knowledge, for example, by building an ontology of cards and searching in 
an abstract symbolic space 16591 . 

Unlike in Poker, where the player has no control over what cards it is dealt, in 
Hearthstone (Blizzard Entertainment, 2014) the player can also construet a deck 
with which to play the game. This adds another level of challenge to playing the 
game; in addition to choosing what action(s) to take at each turn, the successful 
player must also construet what allows her to implement her strategy. The com- 
position of the deck effectively constrains what strategy can be chosen, and then 
implemented tactically through action selection. While these two levels interplay— 
a strong player takes the composition of the deck and the strategy it affords into 
account when choosing a move, and vice versa—it is also true that the problems of 
deck building and action selection can to some extent be treated separately, and im¬ 
plemented in different agents. One approach to deck building is to use evolutionary 
computation. The deck is seen as the genome, and the htness function involves sim¬ 
ple heuristic agents using the deck for playing II218I . A similar approach has also 
been used in the multi-player card game Dominion 14161 . 


3.4.3 Classic Arcade Games 

Classic arcade games, of the type found in late 1970s and early 1980s arcade cabi- 
nets, horne video game consoles and horne computers, have been commonly used as 
AI benchmarks within the last decade. Representative platforms for this game type 
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(a) Track & Field (Konami, 1983) is a game 
about athletics. The game screenshot depicts 
the start of the 100 m dash. Image obtained 
from Wikipedia (fair use). 



(b) In Tapper (Bally Midway, 1983) the 
player Controls a bartender who serves 
drinks to customers. Image obtained from 
Wikipedia (fair use). 


Fig. 3.5 Track & Field (Konami, 1983), Tapper (Bally Midway, 1983), and most classic arcade 
games require rapid reactions and precision. 


are the Atari 2600, Nintendo NES, Commodore 64 and ZX Spectrum. Most classic 
arcade games are characterized by movement in a two-dimensional space (some- 
times represented isometrically to provide the illusion of three-dimensional move¬ 
ment), heavy use of graphical logics (where game rules are triggered by intersection 
of sprites or images), continuous-time progression, and either continuous-space or 
discrete-space movement. 

The cognitive challenges of playing such games vary by game. Most games re¬ 
quire fast reactions and precise timing, and a few games, in particular early sports 
games such as Track & Field (Konami, 1983) and Decathlon (Activision, 1983), rely 
almost exclusively on speed and reactions (Fig. 3.5(a)[ ). Very many games require 
prioritization of several co-occurring events, which requires some ability to predict 
the behavior or trajectory of other entities in the game. This challenge is explicit 
in e.g., Tapper (Bally Midway, 1983)—see Fig. 3.5(b)—but also in different ways 


part of platform games such as Super Mario Bros (Nintendo, 1985), shooting gal- 
leries such as DuckHunt (Nintendo, 1984) or Missile Command (Atari Inc., 1980) 
and scrolling shooters such as Defender (Williams Electronics-Taito, 1981) or R- 
type (Irem, 1987). Another common requirement is navigating mazes or other com¬ 
plex environments, as exemplified most clearly by games such as Pac-Man (Namco, 
1980), Ms Pac-Man (Namco, 1982), Frogger (Sega, 1981) and Boulder Dash (First 
Star Software, 1984), but also common in many platform games. Some games, such 
as Montezuma’s Revenge (Parker Brothers, 1984), require long-term planning in- 
volving the memorization of temporarily unobservable game States. Some games 
feature incomplete Information and stochasticity, others are completely determinis- 
tic and fully observable. 
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3.4.3.1 Pac-Man and Ms Pac-Man 

Various versions and dones of the classic Pac-Man (Namco, 1981) game have been 
frequently used in both research and teaching of artificial intelligence, due to the 
depth of challenge coupled with conceptual simplicity and ease of implementation. 
In all versions of the game, the player character moves through a maze while avoid- 
ing pursuing ghosts. A level is won when all pilis distributed throughout the level 
are collected. Special power pilis temporarily give the player character the power 
to consume ghosts rather than being consumed by them. As seen in Chapter]^ the 
differences between the original Pac-Man (Namco, 1981) and its successor Mi Pac- 
Man (Namco, 1982) may seem minor but are actually fundamental; the most impor¬ 
tant is that one of the ghosts in Ms Pac-Man (Namco, 1982) has non-deterministic 
behavior, making it impossible to learn a fixed sequence of actions as a solution 
to the game. The appeal of this game to the research community is evidenced by a 
recent survey covering over 20 years of active AI research using these two games as 
testbeds 0573L 

Several frameworks exist for Pac-Man-based experimentation, some tied to com- 
petitions. The Pac-Man screen capture competition is based around the Microsoft 
Revenge of Arcade version of the original game, and does not provide a forward 
model nor facilities for speeding up the game II404L The Ms Pac-Man vs Ghost 
Team competition framework is written in Java and includes both a forward model 
and ability to speed up the game significantly; it also includes an interface for con- 
trolling the ghost team rather than Ms Pac-Man, the player character 05741 . The 
Atari 2600 version of Ms Pac-Man (Namco, 1982) is available as part of the ALE 
frameworkj^There is also a Python-based Pac-Man framework used for teaching AI 
at UC Berkeleyj^ 

As expected, the performance of AI players varies depending on the availabil- 
ity of a forward model, which allows the simulation of ghost behavior. The screen 
capture-based competition, which does not offer a forward model, is dominated by 
heuristic approaches (some of them involve pathfinding in the maze without tak- 
ing ghost movement into account), which perform at the level of beginner human 
players II404L It has been observed that even searching one step ahead, and using 
a state evaluator based on an evolved neural network, can be an effective method 
for playing the game 04031 . Of course, searching deeper than a singie ply yields 
additional benefits; however, the stochasticity introduced in Ms Pac-Man (Namco, 
1982) poses challenges even in the presence of a forward model. MCTS has been 
shown to Work well in this case 059011524]| . Model-free approaches to reinforcement 
learning have also been used for playing the game with some success Eli. In gen- 
eral, the best competitors in the Ms Pac-Man vs Ghost Team Competition play at 
the level of intermediate-skill human players 05741 . At the moment of writing this 
bookMs Pac-Man (Namco, 1982) is reported to be practically solved (reaching the 
maximum possible score of 999,990 points) by the Microsoft Maluuba team. The 


4 www.arcadeleamingenvironment.org 
^ http://ai.berkeley.edu/project_overview.html 
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team used an RL technique called hybrid reward architecture II738I which decom- 
poses the reward function of the environment into different RL problems (a set of 
reward functions) tbat a corresponding number of agents need to solve. Eacb agent 
selects its actions by considering tbe aggregated Q values for eacb action across ali 
agents. 

Pac-Man (Namco, 1980) can also be played for experience ratber tban perfor- 
mance. In a series of experiments, neural networks tbat controlled tbe gbosts in a 
clone of tbe game were evolved to make tbe game more entertaining for buman 
players 17661 . Tbe experiment was conceptually based on Malone’s definition of 
fun in games as cballenge, curiosity and fantasy dimensions Em and sougbt to 
find gbost bebavior tbat maximized tbese traits. In particular, tbe fitness was com- 
posed of tbree factors; 1) tbe appropriate level of cballenge (i.e., wben tbe game is 
neitber too bard nor too easy), 2) tbe diversity of gbost bebavior, and 3) tbe gbosts’ 
spatial diversity (i.e., wben gbosts bebavior is explorative ratber tban static). Tbe 
fitness fiinction used to evolve interesting gbost bebaviors was cross-validated via 
User studies Gia. 


3.4.3.2 Super Mario Bros 

Versions and dones of Nintendo’s landmark platformer Super Mario Bros (Nin¬ 
tendo, 1985) bave been extremely popular for AI researcb, including researcb on 
game playing, content generation and player modeling (researcb using tbis game is 
described in several otber parts of tbis book). A large reason for tbis is tbe Mario 
AI Competition, wbicb was started in 2009 and included several different tracks 
focused on playing for performance, playing in a buman-like manner and generat- 
ing levels 0322117171 . Tbe Software framework for tbat competitiorj^ was based on 
Infinite Mario Bros (Notcb, 2008), a Java-based clone of Super Mario Bros (Nin¬ 
tendo, 1985) featuring simple level generation 0706117051 . Different versions of tbe 
competition Software, generally referred to as tbe Mario AI Framework or Mario 
AI Bencbmark, bave since been used in many dozens of researcb projects. In tbe 
following, we will for simplicity refer to metbods for playing various versions of 
Super Mario Bros (Nintendo, 1985), Infinite Mario Bros or tbe Mario AI Frame- 
work/Bencbmark simply as playing “Mario”. 

Tbe first version of tbe Mario AI couid be simulated tbousands of times faster 
tban real-time, but did not include a forward model. Tberefore tbe first attempts to 
learn a Mario-playing agent was through learning a function from a state observa- 
tion directly to Mario actions 07061 . In tbat project, neural networks were evolved 
to guide Mario tbrougb simple procedurally generated levels. Tbe inputs were tbe 
presence or absence of environment features or enemies in a coarse grid centered 
on Mario, and tbe outputs were interpreted as tbe button presses on tbe Nintendo 
controller (up, down, left, rigbt). See Fig. |3.6| for an illustration of tbe state represen- 
tation. A Standard feedforward MLP arcbitecture was used for tbe neural network. 


^ http://julian.togelius.coiTi/mariocompetition2009/ 
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Fig. 3.6 The inputs to the Mario-playing neural network are structured as a Moore neighborhood 
centered on Mario. Each input is 1 if the corresponding tile is occupied by a tile (such as ground) 
that Mario cannot pass through, and 0 otherwise. In another version of the experiment, a second set 
of inputs was added where an input was 1 if there was an enemy at the corresponding tile. Image 
adapted from 17061 . 


and the fitness function was simply how far the controller was able to progress on 
each level. Using this setup and a Standard evolution strategy, neural networks were 
evolved that could win some levels but not all, and generally played at the strength 
of a human beginner. 

However, as is so often the case, having a forward model makes a big difference. 
The first Mario AI Competition, in 2009, was won by Robin Baumgarten, who con- 
structed a forward model for the game by reusing some of the open-source game 
engine code II705L Using this model, he constructed an agent based on A* search in 
state space. At each time frame, the agent searches for the shortest path towards the 
right edge of the screen, and executes the first action in the resulting plan. As the 
search utilizes the forward model and therefore takes place in state space rather than 
just physical space, it can incorporate the predicted movements of the (determinis- 
tic) enemies in its planning. This agent was able to finish all the levels used in the 
2009 Mario AI Competition, and produces behavior that appears optimal in terms 
of time to complete levels (it does not focus on collecting coins or killing enemies). 
See Section 2.3.2 for an explanation of the algorithm and a figure illustrating its use 
in Mario. 
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Fig. 3.7 An example level generated for the 2010 Mario AI Competition. Note the overhanging 
structure in the middle of the screenshot, creating a dead end for Mario; if he chooses to go beneath 
the overhanging platform, he will need to backtrack to the start of the platform and take the upper 
route instead after discovering the wall at the end of the structure. Agents based on simple A* 
search are unable to do this. 


Given the success of the A*-based agent in the 2009 competition, the next year’s 
edition of the competition updated the level generator so that it generated more 
challenging levels. Importantly, the new level generator created levels that included 
“dead ends”, structores where Mario can take the wrong path and if so must back¬ 
track to take the other path 0221 . See Fig. |3.7| for an example of such a dead end. 
These structures effectively “trap” agents that rely on simple best-first search, as 
they end up searching a very large number of paths close to the current position, 
and time out before finding a path that backtracks ali the way to beginning of the 
structure. The winner of the 2010 Mario AI Competition was instead the REALM 
agent ll5^ . This agent uses an evolved rule-based system to decide sub-goals within 
the current segment of the level, and then navigates to these sub-goals using A*. 
REALM successfully handles the dead ends that were part of the levels in the 2010 
competition, and is as far as we know the highest-performing Mario-playing agent 
there is. 

Other search algorithms beyond A* have been tried for playing Mario, including 
Monte Carlo tree search 12941 . It was found that the Standard formulation of MCTS 
did not perform very well, because the algorithm did not search deep enough and 
because the way the average reward of a branch is calculated results in risk-averse 
behavior. However, with ceitain modifications to remedy these problems, an MCTS 
variant was found that could play Mario as well as a pure A* algorithm. In a follow- 
up experiment, noise was added to the Mario AI Benchmark and it was found that 
MCTS handled this added noise much better than A*, probably because MCTS 
relies on statistical averaging of the reward whereas A* assumes a deterministic 
World. 

All of the above work has been focused on playing for performance. Work on 
playing Mario for experience has mostly focused on imitating human playing styles, 
or otherwise creating agents that play Mario similarly to a human. To further this 
research, a Turing test track of the Mario AI Competition was created 161911 . In this 
track, competitors submitted agents, and their performance on various levels was 
recorded. Videos of the agents playing different levels where shown to human spec- 
tators along with videos of other humans playing the same levels, and the spectators 
were asked to indicate which of the videos were of a human player. Agents were 
scored based on how often they managed to fool humans, similarly to the setup of 
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the original Turing test. The results indicated simple heuristic Solutions that included 
hand-coded routines for such things as sometimes standing stili (giving the impres- 
sion of “thinking about the next action”) or occasionally misjudging a jump can 
be very effective in giving the appearance of human playing style. Another way of 
providing human-like behavior is to explicitly mimic humans by learning from play 
traces. Ortega et al. describe a method for creating Mario-playing agents in the style 
of particular human players 1151III : evolve neural networks where the htness func- 
tion is based on whether the agent would perform the same action as the human, 
when faced with the same situation. This was shown to generate more human-like 
behavior than evolving the same neural network architecture with a more straight- 
forward htness function. In a similar effort to create human-like Mario AI players 
Munoz et al. 11469 1 used both play traces and Information about the player’s eyes 
position on the screen (obtained via gaze tracking) as inputs of an ANN, which was 
trained to approximate which keyboard action is to be performed at each game step. 
Their results yield a high prediction accuracy of player actions and show promise to- 
wards the development of more human-like Mario controllers based on information 
beyond gameplay data. 


3.4.3.3 The ALE Framework 

The Arcade Learning Environment (ALE) is an environment for general game- 
playing research based on an emulation of the classic video game console Atari 
2600 m. (While the environment can technically accommodate other emulators as 
well, the Atari 2600 emulator is the one that has been used in practice, to the point 
that the ALE framework is sometimes simply referred to as “Atari”.) The Atari 2600 
is a console from 1976 with 128 bytes of RAM, maximum 32 kilobytes of ROM per 
game and no screen buffer, posing severe limitations on the type of games that could 
be implemented on the system l466l . ALE provides an interface for agents to control 
games via the Standard joystick input, but does not provide any processed version 
of the internal state; instead, it provides the 160 x 210 pixel screen output to the 
agent, which will need to parse this visual information somehow. There is a forward 
model, but it is relatively slow and generally not used. 

Some of the early work using ALE used neuroevolution; in particular a study 
compared several neuroevolution algorithms on 61 Atari games 125111 . They found 
that they could use the popular neuroevolution algorithm NEAT to evolve decent- 
quality players for individual games, provided that these algorithms were given po- 
sitions of in-game objects as recognized by a computer vision algorithm. The Hy- 
perNEAT algorithm, an indirect encoding neuroevolution algorithm that can create 
arbitrarily large networks, was able to learn agents that could play based on the raw 
pixel inputs, even surpassing human performance in three of the tested games. In 
that paper, the neuroevolution approaches generally performed much better than the 
classic reinforcement learning methods tried. 

Later, ALE was used in Google DeepMind’s research on deep Q-learning, which 
was reported in a Nature paper in 2015 II464I . As detailed in Chapter]^ the study 
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showed that by training a deep neural network (five layers, where the first two 
are convolutional) with Q-learning augmented with experience replay, human-level 
playing performance could be reached in 29 out of 49 tested Atari games. That 
research spurred a flurry of experiments in trying to improve the core deep rein- 
forcement formula presented in that paper. 

It is worth noting that in almost all of the ALE work focuses on learning neural 
networks (or occasionally other agent representations) for individual games. That 
is, the network architecture and input representation is the same across all games, 
but the parameters (network weights) are learned for a single game and can only 
play that game. This seems to be at odds with the idea of general game playing, 
i.e., that you could learn agents that play not a single game, but any game you 
give them. It could also be noted that ALE itself is better suited for research into 
playing individual games than for research on general game playing, as there is 
only a limited number of Atari 2600 games, and it is highly non-trivial to create 
new games for this platform. This makes it possible to tune architectures and even 
agents to individual games. 


3.4.3.4 General Video Game AI 


The General Video Game AI (GVGAI) competition is a game-based AI competi- 
tion that has been running since 2014 15281 . It was designed paitly as a response to 
a trend seen in many of the existing game-based AI competitions, e.g., those orga- 
nized at the CIG and AUDE conferences, that submissions were getting increasingly 
game-specihc by incorporating more and more domain knowledge. A Central idea 
of GVGAI is therefore that submissions to the competition are tested on unseen 
games, i.e., games that have not been released to the competitors before and which 
are therefore impossible to tailor the submissions to. At the time of writing, the GV¬ 
GAI repository includes around 100 games, with ten more games being added for 
every competition event. 

To simplify the development of games in GVGAI, a language called the Video 
Game Description Language (VGDL) was developed 01811159711 . This language al- 
lows for concise specification of games in a Python-like syntax; a typical game de¬ 
scription is 20-30 lines, with levels specified in separate files. Given the underlying 
assumptions of 2D movement and graphical logics, most of the games in GVGAI 
corpus are remakes of (or inspired by) classic arcade games such as Frogger (Sega, 
1981) (see Eig. 3.8(b)| i, Boulder Dash (Eirst Star Software, 1984) or Space Invaders 
(Taito, 1978) (see Eig. |3.8(a) i, but some are versions of modern indie games such as 
A Good Snow Man Is Hard to Build (Hazelden and Davis, 2015). 

The original track of the GVGAI competition, for which the most results are 
available, is the single-player planning track. Here, agents are given a fast forward 
model of the game, and 40 milliseconds to use it to plan for the next action. Given 
these conditions, it stands to reason that planning algorithms of various kinds would 
rule the day. Most of the top performers in this track have been based on vari- 
ations on MCTS or MCTS-like algorithms, such as the Open Loop Expectimax 
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(a) Missile Command 
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(b) Freeway 


Fig. 3.8 Two example games that have been used in the GVGAI competition. 


Tree Search algorithm II528II or MCTS with options search M161I . One surprisingly 
high-performing agent uses Iterative Width search, where the core idea is to build 
a propositional database of ali the facts and then use this to prune a breadth-first 
search algorithm to only explore those branches which provide a specified amount 
of novelty, as measured by the size of the smallest set of facts seen for the first 
time 0891 . Agents based on evolutionary planning also perform well, but not as 
well as those based on stochastic tree search or tree search with novelty pmning. 

While some agents are better than others overall, there is ciear non-transitivity in 
the rankings, in the sense that the best algorithms for one particular game may not 
be the best for another game—in fact, there seem to be pattems where families of 
algorithms perform better on families of games 02131 l59l . Given these patterns, a 
natural idea for achieving higher playing performance is to use hyper-heuristics or 
algorithm selection to select at runtime which algorithm to use for which game, an 
approach which has seen some success so far isa. 

Two other GVGAI tracks are related to gameplay, namely the two-player plan¬ 
ning track and the leaming track. The two-player planning track resembles the 
single-player planning track but features a number of two-player games, some of 
which are cooperative and some competitive. At the time of writing, the best agents 
in this track are slightly modified versions of single-player track agents that make 
naive assumptions about the behavior of the other player 02161 : it is expected that 
agents with more sophisticated player models will eventually do better. The learn- 
ing track, by contrast, features single-player games and provides players with time 
to learn a policy but does not provide them with a forward model. It is expected that 
algorithms such as deep reinforcement learning and neuroevolution will do well 
here, but there are as yet no results; it is conceivable that algorithms that learn a 
forward model and then perform tree search will dominate. 


3.4.3.5 Other Environments 

In addition to ALE and GVGAI, there are several other environments that can be 
used for AI experimentation with arcade-style games. The Retro Learning Environ- 
ment is a learning environment similar in concept to ALE, but instead based on an 
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emulation of the Super Nintendo console na. A more general, and less focused, 
System is OpenAI t/niverieQwhich acts as a unified interface to a large number of 
different games, ranging from simple arcade games to complex modern adventure 
games. 


3.4.4 Strategy Games 

Strategy games, particularly computer strategy games, are games where the player 
Controls multiple characters or units, and the objective of the game is to prevail in 
some sort of conquest or conflict. Usually, but not always, the narrative and graphics 
reflect a military conflict, where units may be e.g., knights, tanks or battleships. The 
perhaps most important distinction within strategy games is between turn-based 
and real-time strategy games, where the former leave plenty of time for the player 
to decide which actions to take each time, and the latter impose a time pressure. 
Well-known turn-based strategy games include epic strategy games such as the Civ- 
ilization (MicroProse, 1991) and the XCOM (MicroProse, 1994) series, as well as 
shorter games such as Hero Academy (Robot Entertainment, 2012). Prominent real- 
time strategy games include StarCraft I (Blizzard Entertainment, 1998) and II (Bliz- 
zard Entertainment, 2010), the Age ofEmpires (Microsoft Studios, 1997-2016) se¬ 
ries and the Command and Conquer (Electronic Aits, 1995-2013) series. Another 
distinction is between single-player games that focus on exploration, such as the 
Civilization games, and multi-player competitive games such as StarCraft (Blizzard 
Entertainment, 1998-2015). Most, but not all, strategy games feature hidden Infor¬ 
mation. 

The cognitive challenge in strategy games is to lay and execute complex pians 
involving multiple units. This challenge is in general signiflcantly harder than the 
planning challenge in classical board games such as Chess mainly because multiple 
units must be moved at every tum; the number of units a player Controls can eas- 
ily exceed the limits of short-term memory. The planning horizon can be extremely 
long, where for example in Civilization V (2K Games, 2010) decisions you make 
regarding the building of individual cities will affect gameplay for several hundred 
tums. The order in which units are moved can signiflcantly affect the outcome of 
a move, in particular because a single action might reveal new information dur- 
ing a move, leading to a prioritization challenge. In addition, there is the challenge 
of predicting the moves of one or several adversaries, who frequently have multi¬ 
ple units. Eor real-time strategy games, there are additional perceptual and motorie 
challenges related to the speed of the game. This cognitive complexity is mirrored 
in the computational complexity for agents playing these games—as discussed in 
Section |3.2.1.4[ the branching factor for a strategy game can easily reach millions 
or more. 


^ https://universe.openai.com/ 
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The massive search spaces and large branching factors of strategy games pose 
serious problems for most search algorithms, as just searching a single turn forward 
might already be infeasible. One way of handling this is to decompose the prob- 
lem, so unit acts on its own; this creates one search problem for each unit, with 
the branching factor equivalent to the branching factor of the individual unit. This 
has the advantage of being tractable and the disadvantage of preventing coordinated 
action among units. Nevertheless, heuristic approaches where units are treated sep- 
arately are used in the built-in AI in many strategy games. Not coincidentally the 
built-in AI of many strategy games is generally considered inadequate. 

In research on playing strategy games, some Solutions to this involve cleverly 
sub-sampling the space of turns so that Standard search algorithms can be used. An 
example of such an approach is an MCTS variant based on decomposition through 
Naive Sampling 05031 . Another approach is non-linear Monte Carlo, which was 
applied to Civilization II (MicroProse, 1996) with very promising results ll65l . The 
basic idea here is to sample the space of turns (where each turn consists of actions 
for ali units) randomly, and get an estimate for the value of each tum by performing 
a rollout (take random actions) until a specified point. Based on these estimates, a 
neural network was trained to predict the value of turns; regression can then be used 
to search for the tum with the highest predicted value. 

But planning does not need to be based on tree search. Justesen et al. applied 
Online evolutionary planning to Hero Academy (Robot Entertainment, 2012), a two- 
player competitive strategy game with perfect information and a relatively low num- 
ber of moves per turn (five in the Standard setting) 13091 . Each chromosome con- 
sisted of the actions to take during a single turn, with the fitness function being the 
material difference at the end of the turn. It was found that this approach vastly 
outperformed MCTS (and other tree search algorithms) despite the shallow search 
depth of a single turn, likely because the branching factor made it impossible for the 
tree search algorithms to sufficiently explore even the space of this single turn. 

Evolution has also been used to create agents that play strategy games. Eor exam¬ 
ple, NEAT-based macro-management controllers were trained for the strategy game 
Globulation 2 (2009). In that study, however, the NEAT controller does not aim to 
win but rather play for the experience; in particular, it is evolved to take macro- 
actions (e.g., build planning, battle planning) in order to provide a balanced game 
for all players II499II . AI approaches based on artificial evolution that also have been 
used as playtesting mechanisms for a number of other strategy games II588112971 . 


3.4.4.1 StarCraft 

The original StarCraft, released in 1998 by Blizzard Entertainment, is stili widely 
played competitively, a testament to its strong game design and in particular its 
unusual depth of challenge (see Eig. 3.10 1 . It is usually played with the Brood War 
expansion, and referred to as SC.BW. The existence of Brood War API (BWAPI)^a.n 


https://github.com/bwapi/bwapi 
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interface for playing the game with artificial agents, has enabled a thriving commu- 
nity of researchers and hobbyists working an AI for StarCraft (Blizzard Entertain- 
ment, 1998). Several competitions are held annually based on SC.BW and BWAPlj^ 
including competitions at the IEEE CIG and AUDE conferences fS?). TorchCmft 
is an environment built on top of SC:BW and BWAPI to facilitate machine learn- 
ing, especially deep leaming, research using StarCraft 116811 . In parallel, a similar 
API was recently released for interfacing AI agents with the follow-up game Star¬ 
Craft II (Blizzard Entertainment, 2010), which is mechanically and conceptually 
very similar but with numerous technical differences. Given the existing API and 
competitions, almost ali existing research has been done using SC.BW. 

As SC. BW is such a complex game and the challenge of playing it well is so im¬ 
mense, most research focuses on only part of the problem, most commonly through 
playing it at some level of abstraction. It is common to divide the different lev- 
els of decision-making in SC.BW (and similar real-time strategy games) into three 
levels, depending on the time scale: Strategy, Tactics and Micro-Management (see 
Eig. |3.9[ ). So far, no agents have been developed that can play a complete game at 
the level of even an intermediate human player; however, there has been significant 
progress on several sub-problems of the very formidable problem of playing this 
game. 

Eor a fuller overview of research on AI for playing SC:BW, the reader is referred 
to a recent survey II504II . Below, we will exemplify some of the research done in this 
space, with no pretense of complete coverage. 

On the micro level AI plays out over the timescale of usually less than a minute, 
where time between taking actions is typically on the order of a second. When fo- 
cusing on this most low-level form of SC.BW battle, one need not consider base 
building, research, fog of war, exploration and many other aspects of the full SC.BW 
game. Usually, two factions face off, each with a set of a few or a few dozen units. 
The goal is to destroy the opponent’s units. This game mode can be played in the 
actual game, which does not allow for significant speed-up and does not provide a 
forward model, or in the simulator SparCraft I123L which does provide a forward 
model. (There is a also a Java version of SparCraft, called JarCraft IMII.). 

In the model-free scenario, agents must be based on either hand-crafted policies 
or policies leamed through reinforcement learning or supervised learning, without 
the luxury of a forward model. Hand-crafted policies can be implemented e.g., based 
on potential fields, where different units are attracted to or repelled by other units 
in order to create effective combat patterns II242L or on fuzzy logic 15411 . When 
it comes to machine leaming methods for model-free scenarios, Standard 16221 and 
deep reinforcement learning 11729 II have been used with some effect to leam policies. 
Typically, the problem is decomposed so that a single Q-function is learned that is 
then applied to each unit separately 11729 II . 

Using the SparCraft simulator, we can do more because of the availability of a 
forward model. Churchill and Buro developed a simple approach to dealing with the 
excessive branching factor called Portfolio Greedy Search 01231 . The core idea is 


® http://www.starcraftai.coiTi/ 
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Fig. 3.9 Three different levels of decision making in StarCraft (Blizzard Entertainment, 1998). 
The width of the triangle represents amount of information whereas its colored gradient illustrates 
the degree of partial observability. For instance, at the highest strategic level both the level of 
observability and available information to the player are relatively low. On the other end of the 
triangle. at the lowest level of micro-management the player must consider the type, position, and 
other dynamic properties of each of the units she Controls; that information is mostly observable. 
For reference, a full game of StarCraft (Blizzard Entertainment, 1998) often takes around 20 min- 
utes, though there is considerable variation. The image is reproduced with permission from Gabriel 
Synnaeve. 


that instead of selecting between actions for each unit, a small number (a portfolio) 
of simple heuristics called Scripts are used, and a greedy search algorithm is used to 
assign one of these Scripts to each unit. This approach pmnes the branching drasti- 
cally, but limits the space of discoverable policies to those that can be described as 
combinations of the Scripts. Subsequently, it was shown that the portfolio of Scripts 
idea can be combined with MCTS with good results Olli . Even better results can 
be obtained by doing portfolio selection through evolutionary planning 17451 : it is 
likely that these ideas generalize to many strategy games and other games with high 
branching factors due to their controlling of many units simultaneously. 

Moving to the other end of the micro-management-tactics-strategy continuum, 
large-scale strategy adaptation remains a very hard problem. Existing SC.BW bots 
are rarely able to implement multiple strategies, let alone adapt their strategy based 
on how the game progresses. In order to do this, it is necessary to create a model of 
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Fig. 3.10 A screenshot from StarCraft: Brood War (Blizzard Entertainment, 1998) displaying a 
number of different units available in the game. At the time of writing, playing well this real-time 
strategy game is considered one of the next grand challenges for AI research. Image obtained from 
Wikipedia (fair use). 


what the opponent is trying to do based on limited evidence. Here, pioneering work 
by Weber and Mateas focused on mining logs of SC.BW matches to predict what 
strategy will be taken by a player from early-game actions 17501 . 

A few more ambitious attempts have been made to create complete agents that 
can handle strategy, tactics and micro-management in a principled fashion. For ex- 
ample, Synnaeve and Bessiere built an agent based on Bayesian programming that 
is able to perform reasonably well l680l . 


3.4.5 Racing Games 

Racing games are games where the player is tasked with controlling some kind of 
vehicle or character so as to reach a goal in the shortest possible time, or as to tra- 
verse as far as possible along a track in a given time. Usually the game employs a 
brst-person perspective, or a vantage point from just behind the player-controlled 
vehicle. The vast majority of racing games take a continuous input signal as a steer- 
ing input, similar to a steering wheel. Some games, such as those in the Forza Mo- 
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Fig. 3.11 A screenshot from TORCS which has been predominately used in the Simulated Cai' 
Racing Championship. Image obtained from https://sourceforge.net/projects/torcs/ (fair use). 


torsport (Microsoft Studios, 2005-2016) or Real Racing (Firemint and EA Games, 
2009-2013) series, allow for complex input including gear stick, clutch and hand- 
brake, whereas more arcade-focused games such as those in the Need for Speed 
(Electronic Arts, 1994-2015) series typically have a simpler set of inputs and thus 
lower branching factor. Racing games such as those in the WipeOut (Sony Computer 
Entertainment Europe, 1995-2012) and Mario Kart (Nintendo, 1992-2017) series 
introduce additional elements, such as weapons that can be used to temporarily in- 
capacitate competitors’ vehicles. 

While the cognitive challenges in playing racing games may appear simple, most 
racing games actually require multiple simultaneous tasks to be executed and have 
significant skill depth. At the most basic level, the agent needs to control for the 
position of the vehicle and adjust the acceleration or braking, using fine-tuned con- 
tinuous input, so as to traverse the track as fast as possible. Doing this optimally 
requires at least short-term planning, one or two tums (of the track) forward. If there 
are resources to be managed in the game, such as fuel, damage or speed boosts, this 
requires longer-term planning. When other vehicles are present on the track, there 
is an adversarial planning aspect added, in trying to manage or block overtaking; 
this planning is often done in the presence of hidden information (position and re¬ 
sources of other vehicles on different parts of the track) and under considerable time 
pressure, and benefits from models of the adversarial drivers. 

One relatively early commercial game application that stands out is the AI for 
Forza Motorsport (Microsoft Studios, 2005), which was marketed under the name 
Drivatar 12591 . The Drivatar agents are built on a form of supervised lazy learning. 
To train the agents, humans drive a number of racing tracks, which are composed 
of a number of segments; ali tracks in the game need to be composed of segments 
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drawn from the same “alphabet”. During driving, the agent selects the driving com- 
mands that most closely approximate the racing line taken by the players on the 
relevant segment. This approach was successful in realizing personalized driving 
agents, i.e., agents that could drive new tracks in the style of human players they 
had been trained on, but posed restrictions on the design of the tracks. 

There are various approaches to training agents to drive without supervision, 
through reinforcement learning. A sequence of papers has shown how neuroevolu- 
tion can be used to train agents that drive a single track in the absence of other cars 
as well as a good human driver 07071 . how incremental evolution can be used to 
train agents with sufficiently general driving skills to drive unseen tracks Il709l . and 
how competitive co-evolution can be used to adversarially train agents to drive more 
or less aggressively in the presence of other cars 07080 . In ali these experiments, the 
weights of a relatively small fixed-topology network were trained with an evolu¬ 
tion strategy. The inputs to the network were the speed of the car and a handful of 
rangefinder sensors that returned the distance to the edges of the track, or other cars. 
The low dimensionality of the resulting network enabled high-performing networks 
to be found relatively easily. 

The Simulated Car Racing Championship, which has run annually since 2007, is 
partly based on this work, and uses a similar sensor model. The first year, the com- 
petition was based on simple 2D racing game, and the winner of the competition 
was a controller based on fuzzy logic ll710i . In 2008, the competition Software was 
rebuiit around TORCS, a 3D racing game with a reasonably sophisticated physics 
model 113931 (see Fig. |3.1 1| ). In the following years, a large number of competitors 
submitted agents based on various different architectures to the competition, includ- 
ing evolutionary computation, temporal difference learning, supervised learning and 
simple hand-coded rule-based systems M393I . A general trend has been observed 
over the course of the competition that the winning agents incorporate more and 
more domain knowledge in the form of hand-coded mechanisms, with learning al- 
gorithms generally only used for tuning parameters of these mechanisms. The best 
agents, such as COBOSTAR ll90l or Mr. Racer II543II generally drive as well as or 
better than a good human driver when driving alone on a track, but stili struggle 
with overtaking and other forms of adversarial driving. 

As discussed above, the Simulated Car Racing Championship provides infor- 
mation in a form that is relatively easy to map to driving commands, making the 
learning of at least basic driving strategy (but not fine-tuning) relatively easy. How- 
ever, some authors have attempted learning to drive from raw pixel data. Early work 
on this topic includes that by Floreano et al., who evolved a neural network with 
a movable “retina” to drive in a simple simulated environment. The output of the 
neural network included both driving commands and commands for how to move 
the retina, and only the relatively few pixels in the retina were used as inputs to the 
network II206I . Later, Koutnik et al. managed to evolve controllers that used higher- 
dimensional input by evolving the networks in compressed weight space; essen- 
tially, the parameters of a JPEG encoding of the network connections was evolved, 
allowing evolutionary search to work effectively in the space of large neural net- 
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Works 0531 . Supervised learning of deep networks has also been applied to visual 
driving, yielding high-performing TORCS drivers that learn from examples Il795l . 

The examples above do not make use of a forward model of any kind. How- 
ever, car dynamics are relatively simple to model, and it is easy to create a fast 
approximate model for racing games. Given such a model, Standard tree search al- 
gorithms can easily be applied to car control. For example, Fischer et al. showed 
that MCTS coupled with a simple forward model can produce decent performance 
in TORCS Il20l . 


3.4.6 Shooters and Other First-Person Games 

First-person shooters constitute an important genre of video games ever since the 
success of DOOM (GT Interactive, 1993) and Wolfenstein 3D (Apogee Software 
and FormGen, 1992) in the early 1990s. While a basic tenet of an FPS would seem 
to be that the world is observed through a first-person point of view, there are games 
that are generally recognized as FPSes, such as the Gears ofWar (Microsoft Studios, 
2006-2016) series, which have the camera positioned slightly behind and/or above 
the player. Similarly, the word “shooter” signifies that the games revolve around 
shooting projectiles with some kind of weapon. On that basis, a game such as Portal 
(Electronic Arts, 2007) can be seen as an FPS though it is debatable whether the 
player implement is actually a weapon. 

Shooters are often seen as fast-paced games where speed of perception and reac- 
tion is crucial, and this is true to an extent, although the speed of gameplay varies 
between different shooters. Obviously, quick reactions are in general not a prob- 
lem for a computer program, meaning that an AI player has a certain advantage 
over a human by default. But there are other cognitive challenges as well, including 
orientation and movement in a complex three-dimensional environment, predict- 
ing actions and locations of multiple adversaries, and in some game modes also 
team-based collaboration. If visual inputs are used, there is the added challenge of 
extracting relevant Information from pixels. 

There has been some early work on optimizing parameters for existing agents in 
order to improve their efficiency 11271 . but extensive work on AI for FPS games was 
spurred by two competitions: first, the 2K BotPrize and more recently VizDoom. 


3.4.6.1 Unreal Tournament 2004 and the 2K BotPrize 

Unreal Tournament 2004 (UT2k4) (Epie Games, 2004) is an EPS which was re- 
leased in 2004, with what was at the time state-of-the-art graphies and gameplay. 
While the game itself has not been open sourced, a team based at the Charles Uni- 
versity in Prague created Pogamut, a Java-based API that allows for simple control 
of the game 12221 . Pogamut supplies the agent with an object-based information in- 
terface, which the agent can query about the locations of objects and characters, and 











140 


Chapter 3. Playing Games 



(a) Thejudges’ room 


(b) The players’ room 



Fig. 3.12 The first 2K BotPrize competition held in Perth, Australia, on 17 December 2008, as 
part of the 2008 IEEE Symposium on Computational Intelligence and Games. 


also provides convenience functions for executing actions such as firing a projectile 
towards a specific point. 

Some Work using UT2k4 tries to achieve high-performing agents for one or 
several in-game tasks, using techniques such as neuroevolution. For example, van 
Hoorn et al. subdivided the task of playing UT2k4 into three sub-tasks; shooting, ex- 
ploring and path-following 07341 . Using an earlier approach 06981 . which combines 
neuroevolution with the subsumption architecture of Rodney Brooks iCqI, they then 
evolved neural networks for each of these tasks in succession. The resulting agent 
was able to play some game scenarios relatively effectively. 

However, the main use of the UT2k4 benchmark has been in the 2K BotPrize 
(see Fig. 3.12i. This competition, which ran from 2008 to 2014, stands out among 
game-based AI competitions for not focusing on playing for performance, but rather 
for playing for experience. Specifically, it was a form of Turing test, where submit- 
ted agents were judged not by how well they survived firefights with other agents, 
but by whether they could fool human judges (who in later configurations of the 
competition also participated in the game) that they were humans 026211263112641 . 

The winners of the final 2K BotPrize in 2014 were two teams whose bots man- 
aged to convince more than half of the human judges that they (the bots) were 
human. The first winning team, UT'2, from the University of Texas at Austin, is 
primarily based on neuroevolution through multiobjective evolution 06031 . It con- 
sists of a number of separate controllers, where most of these are based on neural 
networks; at each frame, it cycles through all of these controllers, and uses a set of 
priorities to decide the outputs of which controller will command various aspects 
of the agent. In addition to neural networks, some controllers are built on different 
principies, in particular the Human Retrace Controller, which uses traces of human 
players to help the agent navigate out of stuck positions. The second winner, Mir- 
rorBot by Mihai Polceanu, is built around the idea of observing other players in the 
game and mirroring their behavior 05 3511 . 
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3.4.6.2 Raw Screen Inputs and the Visual Doom AI Challenge 


The VizDoom framework 03311 is build around a version of the classic DOOM (GT 
Interactive, 1993) FPS game that allows researchers to develop AI bots that play 
the game using oniy the screen buffer. VizDoom was developed as an AI testbed 
by a team of researchers at the Institute of Computing Science, Poznan University 
of Technology (see Fig. 3.13| l. The framework includes several tasks of varying 
complexity, from health pack collection and maze navigation to all-out deathmatch. 
An annual competition based on VizDoom is held at the IEEE CIG conference since 
2016, and the framework is also included in OpenAI Gymj^a collection of games 
which can be used for AI research. 

Most of the published work on VizDoom has been based on deep reinforcement 
learning with convolutional neural networks, given that method’s proven strength in 
learning to act based on raw pixel inputs. Eor example, Arnold, a well-performing 
agent in the first VizDoom competition, is based on deep reinforcement learning of 
two different networks, one for exploration and one for hghting una. 

But it is also possible to use evolutionary computation to train neural network 
controllers. As the very large input size requires huge networks, which do not work 
well with evolutionary optimization in general, it is necessary to compress the in- 
formation somehow. This can be done by using an autoencoder trained on the visual 
stream as the game is played; the activations of the bottleneck layer of the autoen¬ 
coder can then be used as inputs to a neural network that decides about the actions, 
and the weights of the neural network can be evolved d. Previous attempts at 
evolving controllers acting on visual input in the related game Quake (GT Interac¬ 
tive, 1996) have met with only limited success 05191 . 


3.4.7 Serious Games 

The genre of serious games, or games with a purpose beyond enteitainment, has 
become a focus domain of recent studies in game AI. One could argue that most 
existing games are serious by nature as they incorporate some form of learning 
for the player during play. Games such as Minecraft (Mojang, 2011), for example, 
were not designed with a particular learning objective in mind; nevertheless they 
have been used broadly in classrooms for Science education. Eurther, one could ar¬ 
gue that serious games do not have a particular genre of their own; games may 
have a purpose regardless of the genre they were designed on. Strictly speaking 
the design of serious games involves a particular set of learning objectives. Learn¬ 
ing objectives may be educational objectives such as those considered in STEM 
education—a popular example of such a game is the Dragonbox (WeWantToKnow, 
2011) series that teaches primary school students equation solving skills, and basic 
addition and subtraction skills. (A serious academic effort on game-based STEM 
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Fig. 3.13 A screenshot from the VizDoom framework which—at the moment of writing—is used 
in the Visual Doom AI Challenge. The framework is giving access to the depth of the level (en- 
abling 3D vision). Image obtained from http://vizdoom.cs.put.edu.pl/ with permission. 


education is the narrative-centered Crystal Island game series for effective Science 
learning 0577115841 .) The learning objective can, instead, be the training of social 
skills such as conflict resolution and social inclusion through games; Village Voices 
03361 (see Fig. 3.14(a) i, My Dream Theater OlOOI (see Fig. 3.14(b)| i, and Prom Week 
04471 are examples of such soft skill training games. Altematively, the aim could be 
that war veterans suffering from post-traumatic stress disorder are trained to cope 
with their cognitive-behavioral manifestations when faced with in-game stressors in 
games such as StartleMart 0272112701 and Virtual Iraq 02271 . The learning objec¬ 
tive could also be that of soliciting collective intelligence for scientists. A number 
of scientific games have recently led to the discovery of new knowledge via crowd- 
playing (or else human computation); arguably one of the most popular scientific 
discovery games is Foldit uni through which players collectively discovered a 
novel algorithm for protein folding. 

The cognitive and emotional skills required to play a serious game largely de- 
pend on the game and the underlying learning objectives. A game about math would 
normally require computation and problem solving skills. A game about stress in- 
oculation and exposure therapy would instead require cognitive-behavioral coping 
mechanisms, metacognition and self-control of cognitive appraisal. The breadth of 
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(a) ViUage Voices. Screenshot image adapted (b) My Dream Theater. Screenshot image 
from 1^ . adapted from (99). 

Fig. 3.14 The Vdlage Voices and the My Dream Theater games for conflict resolution. Village 
Voices realizes experiential leaming of conflict in social, multi-player settings, whereas My Dream 
Theater offers a single-player conflict management experience. In the first game the AI takes the 
role of modeling conflict and generating appropriate quests for the players, whereas—more rele¬ 
vant to the aims of this chapter—in My Dream Theater AI takes the role of controlling expressive 
agent (NPC) behaviors. 


cognitive and emotional skills required from players is as wide as the number of 
different learning objectives a serious game can integrate into its design. 

Many serious games have NPCs and AI can help in making those NPCs be- 
lievable, human-like, social and expressive. AI in serious games is generally useful 
for modeling NPC behavior and playing the game as an NPC but not for winning; 
rather for the experience of play. Whether the game is for education, health or sim- 
ulation purposes NPC agents need to act believably and emotively in order to em- 
power learning or boost the engagement level of the game. Years of active research 
have been dedicated on this task within the fields of affective computing and Vir¬ 
tual agents. The usual approach followed is the construction of top-down (ad-hoc 
designed) agent architectures that represent various cognitive, social, emotive and 
behavioral abilities. The focus has traditionally being on both the modeling of the 
agents behavior but also on its appropriate expression under particular contexts. A 
popular way of constructing a computational model of agent behavior is to base it 
on a theoretical cognitive model such as the OCC model 0512lll83l[T6lll89l 12371 . 
which attempts to effect human-like decision making, appraisal and coping mech- 
anisms dependent on a set of perceived stimuli. For the interested reader, Marsella 
et al. 142811 cover the most popular computational models of emotion for agents in a 
thorough manner. 

Expressive and believable conversational agents such as Greta 153411 and Rea 
una or Virtual humans IMI that embody affective manifestations can be consid- 
ered in the design of serious games. The use of such character models has been dom- 
inant in the domains of intelligent tutoring systems II131L embodied conversational 
agents 111041 [Thll . and affective agents II238I for educational and health purposes. 
Notable examples of such agent architecture systems include the work of Lester’s 
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group in the Crystal Island game series 05781 15771 15841 . the expressive agents of 
Prom Week 04471 and Fagade 04411 . the agents of the World ofMinds 01901 game, 
and the FAtiMA 01681 enabled agents of My Dream Theater 01 001 . 


3.4.8 Interactive Fiction 

While there are several variations, games within the Interactive flction genre nor- 
mally contain a fantasy world consisting of smaller areas such as rooms; however, 
a simulated environment is not a necessity. Importantly, players need to use text 
commands to play the game. The player can notmally interact with the objects and 
available game characters, collect objects and store them in her inventory, and solve 
various puzzles. Games of this genre are also named text-based adventure games or 
often associated with text-based role-playing games. Popular examples include the 
games of the Zork series (Infocom, 1979-1982) and Fagade im. 

In this game genre, AI can play the role of understanding text as coming from 
players in a natural language format. In other words, the game AI can feature nat- 
ural language processing (NLP) for playing the game as the player’s companion 
or as an opponent. Further NLP can be used as an input for the generation of a di- 
alog, a text or a story in an interactive fashion with the player. It is normally the 
case that text-based input is used to drive a story (interactive narrative) which is of¬ 
ten communicated via embodied conversational agents and is represented through 
the Iens of a Virtual camera. Similarly to traditional cinematography, both the cam¬ 
era position and the communicated narrative contribute to the experience of the 
viewer. Opposed to traditional cinematography, however (but similarly to interac¬ 
tive drama), the story in games can be influenced by the player herself. It goes with- 
out saying that research in text-based games is naturally interwoven with research 
in believable conversational agents (as covered in the previous section), computa- 
tional and interactive narrative 16931144111562117921 and virtual cinematography 
125211193113001 l84l ITSl 15781 . The discussion on the interplay between interactive 
narrative and virtual cinematography is expanded in Chapter]^ and in particular in 
the section dedicated to narrative generation. 

Work on text-based AI in games starts from the early language-based interaction 
with Eliza 17511 and the Z-Machine used by text adventure games such as Zork I 
(Infocom, 1980), to Fagade 1438114411 and to recent word2vec 14591 approaches 
(e.g., its TensorFlow implementation 12) for playing Q & A games 13401 and text- 
based adventure games 13521. It is important to note that beyond the use of AI to 
understand natural language we can use AI to play text-based games. A notable re¬ 
cent example of an agent that manages to handie both tasks is the one developed by 
Kostka et al. EU, named Golovin. The Golovin agent uses related corpora such 
as fantasy books to create language models (via word2vec 14591 ) that are appropri- 
ate to this game domain. To play the game the agent uses five types of command 
generators; battle mode, gathering items, inventory commands, general actions and 
movement. Golovin is validated on 50 interactive fiction games demonstrating com- 
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parable performance to the current state of the art. Another example is the agent 
of Narasimhan et al. II475I that plays multi-user dungeon games, a form of multi- 
player or collaborative Interactive fiction. Their agent converts text representations 
to state representations using Long Short-Term Memory (LSTM) networks. These 
representations feed a deep Q network which leams approximate evaluations for 
each action in a given game state M475L Their approach significantly outperforms 
other baselines in terms of number of completed quests in small-, and even medium- 
sized, games. For the interested reader a dedicated annual competition on text-based 
adventure game AI was initiated in conjunction with the IEEE CIG conference in 
2016|^ Participants of the competition submit agents that play games for the Z- 
Machine. 

Examples of usefui development tools for text-based games include the Inform 
series of design systems and programming languages, which are inspired by the Z- 
machine and have led to the development of several text-based games and Interactive 
fiction based on natural language. Notably, Inform /f*^ was used for the design of 
the Mystery House Possessed (Emily Short, 2005) game. 


3.4.9 Other Games 

The list of games AI can play is not limited to the genres covered above. While the 
genres we covered in more detail are, in our opinion, the most representative there 
are AI studies focusing on other game genres, a number of which we outline below. 

A game type with an increasing interest is casual games due to their growing 
popularity and accessibility via mobile devices in recent years. Casual games are 
often simple and are designed as short episodes of play (levels) to allow flexibility 
with respect to gameplay time. This feature gives the player the ability to conclude 
an episode in a short period of time without needing to save the game. As a re¬ 
suit, the player can engage on a single level for a few seconds, or play a number 
of levels throughout the day, or instead repeatedly play new levels up to hours of 
gameplay. The game skills required to play casual games depend on the genre of 
the casual game which can vary from puzzle such as Bejeweled (PopCap Games, 
2001), Angry Birds (Chillingo, 2009), and Cut the Rope (Chillingo, 2010), to adven¬ 
ture such as Dream Chronicles (KatGames, 2007), to strategy such as Diner Dash 
(PlayPirst, 2004), to arcade such as Plants vi. Zombies (PopCap Games, 2009) and 
Feeding Frenzy (PopCap Games, 2004), to card and board games such as SUngo 
Quest (funkitron, Inc., 2006). 

Notable academic efforts on casual games include the work of Isaksen et al. 
II288112891 where an AI agent is built to test the difficulty of Flappy Bird (dot- 
GEARS, 2013) levels. The baseline AI player follows a simple pathfinding algo- 
rithm that performs well in completing the Flappy Bird levels. To imitate human 


" http://atkrye.github.io/IEEE-CIG-Text-Adventurer-Competition/ 
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play, however, the AI player features elements of human motor skills such as pre- 
cision, reaction time, and actions per second. Another example on that line of work 
is the use of AI agents to test the generation of game levels in a variant of Cut 
the Rope (Chillingo, 2010). The AI agents featured in the Ropossum authoring tool 
both perform automatic playtesting and also optimize the playability of a level us- 
ing first-order logic M614I . The level generation elements of Ropossum are further 
discussed in the next chapter. Another casual game that has recently attracted the 
interest of game AI research is Angry Birds (Chillingo, 2009). The game has an 
established AI competition 15601 . named the Angry Birds AI Competition\^ that 
runs since 2012 mainly in conjunction with the International Joint Conference on 
Artificial Intelligence. AI approaches in Angry Birds (Chillingo, 2009) have so far 
focused mainly on planning and reasoning techniques. Examples include a qual¬ 
itative spatial reasoning approach which evaluates level strTictural properties and 
game rules, and infers which of these are satisfied for each building block of the 
level 17961 . The usefulness of each level building block (i.e., how good it is to hit 
it) is then computed based on these requirements. Other approaches model discrete 
knowledge about the current game state of Angry Birds (Chillingo, 2009) and then 
attempt to satisfy the constraints of the modeled world W2\ based on extensions of 
answer set programming ll69ll . 

Beyond casual games the genre of flghting games has received a considerable 
amount of interest from both academic and industrial players. Fighting games re¬ 
quire cognitive skills mostly related to kinesthetic control and spatial navigation 
but also related to reaction times and decision making, both of which need to be 
fast 13491 . Popular approaches for fighting games include classic reinforcement 
learning—in particular, the SARSA algorithm for on-policy learning of Q values 
which are represented by linear and ANN function approximators—as applied by 
the Microsoft Research team in the Tao Feng: Fist of the Lotus (Microsoft Game 
Studios, 2003) game II235I . Reinforcement learning has also been applied with 
varying degrees of success for adaptive difficulty adjustment in fighting games 
111581156111271 : i.e., AI that plays for the experience of the player. Evolutionary rein¬ 
forcement learning variants have also been investigated for the task Hlzl. A notable 
effort in the fighting games AI research scene has been on the Java-based fighting 
game FightingICE provided by the Fighting Game AI Competitioi ^^ 14021 (see Eig. 
3.15| l. The competition is organized by the Ritsumeikan University in Japan and has 
run since 2013 with the aim to derive the best possible fighting bot; i.e., AI that 
plays to win. Approaches using FightingICE vary from dynamic scripting BTTI . 
to k-nearest neighbor B760I . to Monte Carlo tree search 17901 . to neuroevolution 
||357l, among others. So far MCTS-based approaches appear to be advantageous on 
winning in fighting games. 

The last game we will cover in this section is Minecraft (Mojang, 2011) for its 
unique properties as a testbed for game AI research. Minecraft (Mojang, 2011) is a 
sandbox game played in a 3D procedurally generated world that players can navi- 
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Fig.3.15 A screenshot from the Java-based FightingICE framework for fighting games. 


gate through. The game features game mechanics that enable players to build con- 
structions out of cubes (see Fig. 3.16| l but it does not have a specific goal for the 
player to accomplish. Beyond exploratiori and building, players can also gather re- 
sources, craft objects and combat opponents. The game has sold more than 121 mil- 
lion copies across all platforms EU, making it the second best-selling video game 
of all time, only behind Tetris (Alexey Pajitnov and Vladimir Pokhilko, 1984)1*^ 
The 3D open-world nature of Minecraft (Mojang, 2011) and the lack of specific 
goals provides players ultimate freedom to play and explore the world in dissimi- 
lar ways. The benefits of playing Minecraft (Mojang, 2011) appear to be many and 
some of them have already been reported by the educational research community 
M479II . For example, the blocks available in the game can be arranged to produce 
any object a player might think of, thereby fostering the player’s creativity 11479 II 
and diagrammatic lateral thinking II774I . Further, the blocks’ functionalities that 
can be combined and extended may resuit in new knowledge for the players that is 
acquired gradually. Moreover, the simple stylized voxel-based graphics of the game 
allow the player to concentrate on the gameplaying and exploration tasks within a 
simple, yet aesthetically pleasing, environment. Overall, the multitude of reasons 
that make Minecraft (Mojang, 2011) so appealing to millions of players are also the 
reasons that make the game a great testbed for AI and game AI research. In particu- 
lar, the game offers an AI player an open world ready to be explored with numerous 
possibilities for open-ended play. Further, in-game tasks for an AI agent vary from 
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Fig. 3.16 A screenshot from Minecreaft (Mojang, 2011) showcasing a city hall structure made by 
the MCFRArchitect Build Team. Image obtained from Wikipedia (fair use). 


exploratiori and seeking treasure to making objects and building structures, alone or 
as a team of agents. 

A notable and recent effort on the use of Minecraft (Mojang, 2011) for AI is 
Project Malmo 030511 which is supported by Microsoft Research. Project Malmo is 
a Java-based AI experimentation platform built as a game mod of the original game 
and designed to support research within the areas of robotics, computer vision, ma- 
chine learning, planning, and multi-agent systems and reneral game AI 13051 . Note 
that the open-source platform is accessible via GitHubp^Early experiments with AI 
in Project Malmo include the application of deep neural networks for navigating the 
3D mazes 04650 and for hghting the opponents of the game 07261 . Beyond Project 
Malmo, it is also worth noting that various other mods of the game have been used 
directiy for teaching robotics CD — including algorithms for maze navigation and 
planning—and teaching general AI methods 


3.5 Further Reading 

The methods used for playing games have an extensive body of literature that is 
covered in detail both in Chapter and here. The different game genres that we 
covered in this chapter contain a corresponding literature the interested reader could 
use as a starting point for further exploration. 
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3.6 Exercises 

Prior to moving to the next chapter about generating content herein we provide a 
general exercise that allows you to apply any of the algorithms covered in Chapter]^ 
in a single domain. The purpose of the exercise is for you to get familiar with those 
methods before testing them in more complicated and complex domains and tasks. 

As discussed above, the Ms Pac-man vi Ghost Team competition is a contest held 
at several AI conferences around the world in which AI controllers for Ms Pac-Man 
and the ghost team compete for the highest ranking. For this exercise, you will have 
to develop a number of Ms Pac-Man AI players to compete against the ghost team 
controllers included in the Software package. This is a simulator entirely written in 
lava with a well-documented interface. While the selection of two to three different 
agents within a semester period has been shown to be a good educational practice 
we will leave the final number of Ms Pac-Man agents to be developed on you or 
your class instructor. 

The website of the book contains code for the game and a number of different 
sample code classes in lava to get you started. Ali AI methods covered in Chapter]^ 
are applicable to the task of controlling Ms Pac-Man. You might find out, however, 
that some of them are more relevant and efficient than others. So which methods 
would Work best? How do they compare in terms of performance? Which state 
representation should you use? Which utility function is the most appropriate? Do 
you think you can make the right implementation decisions so that your Ms Pac- 
Man plays at a professional level? How about at a world Champion, or even at a 
superhuman level? 


3.6.1 Why Ms Pac-Man? 

Some of our readers might object to this choice of game and think that there should 
be more interesting games AI can play with. While Ms Pac-Man (Namco, 1982) 
is very old it is arguably a classic game that is stili fun and challenging to play, 
as well as a simple testbed to start experimenting with the various AI methods and 
approaches introduced in this chapter and the previous one. Ms Pac-Man (Namco, 
1982) is simple to understand and play but is not easy to master. This controversial 
element of play simplicity combined with problem complexity makes Ms Pac-Man 
(Namco, 1982) the ideal testbed for trying out different AI methodologies for con¬ 
trolling the main character, Ms Pac-Man. Another exciting feature of this game is its 
non-deterministic nature. Randomness not only augments the fun factor of the game 
but it also increases the challenge for any AI approach considered. As discussed, an¬ 
other argument for the selection of Ms Pac-Man (Namco, 1982) is that the game and 
its variants have been very well studied in the literature, tested through several game 
AI competitions, but also covered for years on AI (or game AI) courses in several 
universities across the globe. The reader is also referred to a recent survey paper 
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by Rohlfshagen et al. 05731 . covering over 20 years of research in Pac-Man and a 
YouTube videcp^about the impoitance of Pac-Man in game AI research. 


3.7 Summary 

In this chapter we discussed the different roles AI can take and the different char- 
acteristics games and AI methods have, the various methods that are available for 
playing games and the different games it can play. In particular, AI can either play to 
win or play in order to create a particular experience for a human player or observer. 
The former goal involves maximizing a utility that maps to the game performance 
whereas the latter goal involves objectives beyond merely winning such as engage- 
ment, believability, balance and interestingness. AI as an actor can take the role of 
either a player character or a non-player character that exists in a game. The charac- 
teristics of games an AI method needs to consider when playing include the number 
of players, the level of stochasticity of the game, the amount of observability avail¬ 
able, the action space and the branching factor, and the time granularity. Further, 
when we design an algorithm to play a game we also need to consider algorith- 
mic aspects such as the state representation, the existence of a forward model, the 
training time available, and the number of games AI can play. 

The above roles and characteristics were detailed in the hrst part of the chapter as 
they are important and relevant regardless of the AI method applied. When it comes 
to the methods covered in this chapter, we focused on tree-search, reinforcement 
learning, supervised learning, and hybrid approaches for playing games. In a sense, 
we tailored the methods outiined in Chapterj^to the task of gameplaying. The chap¬ 
ter concluded with a detailed review of the studies and the relevant methodologies 
on a game genre basis. Specihcally, we saw how AI can play board, card, arcade, 
strategy, racing, shooter, and serious games, Interactive hction, and a number of 
other game genres such as casual and hghting games. 
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Chapter 4 

Generating Content 


Procedural content generation (PCG) 16161 is an area of game AI that has seen 
an explosive growth of interest. While games that incorporate some procedurally 
generated content have existed since the early 1980s—in particular, the dungeon 
crawler Rogue (Toy and Wichmann, 1980) and the space trading simulator Elite 
(Acornsoft, 1984) are early trailblazers—^research interest in academia has really 
picked up within the second half of the last decade. 

Simply put, PCG refers to methods for generating game content either au- 
tonomously or with only limited human input. Game content is that which is con- 
tained in a game: levels, maps, game mles, textures, stories, items, quests, music, 
weapons, vehicles, characters, etc. Typically, NPC behavior and the game engine 
itself are not thought of as content. Probably the most common current usage of 
PCG is for generating levels and terrains, but in the future we might see widespread 
generation of ali kinds of content, possibly even complete games. 

Seeing PCG from an AI perspective, content generation problems are AI prob- 
lems where the Solutions are content artifacts (e.g., levels) that fulfill certain con- 
straints (e.g., being playable, having two exits) and/or maximize some metrics (e.g., 
length, difference in outcomes between different strategies). And as we will see, 
many AI methods discussed in Chapterj^can be used for PCG purposes, including 
evolutionary algorithms and neural networks. But there are also a number of meth¬ 
ods that are commonly used for PCG that are not typically thought of as AI; some 
of these will be presented in this chapter. 

This chapter is devoted to methods for generating game content, as well as 
paradigms for how to incorporate them into games. We start with discussing why 
you want to use PCG at all—just like for playing games, there are some very dif¬ 
ferent motivations for generating content. Then in Section 4.2 we present a general 
taxonomy for PCG methods and possible roles in games. Next, Section 4.3 summa- 


rizes the most important methods for the generation of content. Shifting attention 
to which roles PCG can take in games Section 4.4 discusses ways of involving de- 
signers and players in the generative process. Section [43] takes the perspective of 
content types, and presents examples of generating some common and uncommon 
types of game content. Finally, Section 4.6 discusses how to evaluate PCG methods. 


© Springer International Publishing AG, part of Springer Nature 2018 
G. N. Yannakakis and J. Togelius, Artificial Intelligence and 
Games, https://doi.org/10.1007/978-3-319-63519-4_4 
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4.1 Why Generate Content? 


Perhaps the most obvious reason to generate content is that it could remove the need 
to have a human designer or artist creating that content. Humans are expensive and 
slow, and it seems we need more and more of them all the time. Ever since computer 
games were invented, the number of person-months that go into the development of 
a successful commercial game has increased more or less constantly[^ It is now 
common for a game to be developed by hundreds of people over a period of several 
years. This leads to a situation where fewer games are prohtable, and fewer develop- 
ers can afford to develop a game, leading in turn to less risk-taking and less diversity 
in the games marketplace. Many of the costly employees necessary in this process 
are designers and artists rather than programmers. A game development company 
that could replace some of the artists and designers with algorithms would have a 
competitive advantage, as games could be produced faster and cheaper while pre- 
serving quality. (This argument was made forcefully by legendary game designer 
Will Wright in his talk “The Future of Content” at the 2005 Game Developers Con- 
ference, a talk which helped reinvigorate interest in procedural content generation.) 


Figure 4.1 illustrates a cost breakdown of an average AAA game and showcases 
the dominance of artwork and marketing in that process. Art, programming, and de- 
bugging constitute around 50% of the cost of an AAA game. Essentially, PCG can 
assist in the processes of art and content production, thus directly contributing to the 
reduction of around 40% of a game’s cost. 

Of course, threatening to put them out of their jobs is not a great way to sell PCG 
to designers and artists. It is also true that at the current stage of technology, we are 
far from being able to replace all that a designer or artist can do. We could therefore 
tum the argument around; content generation, especially embedded in intelligent de- 
sign tools, can augment the creativity of individual human creators. Humans, even 
those of the “Creative” vein, tend to imitate each other and themselves. Algorithmic 
approaches might come up with radically different content than a human would cre¬ 
ate, through offering an unexpected but valid solution to a given content generation 
problem. Some evidence for this is already available in the literature 17741 . This 
could make it possible for small teams without the resources of large companies, 
and even for hobbyists, to create content-rich games by freeing them from worry- 
ing about details and drudge work while retaining overall directorship of the games. 
PCG can then provide a way for democmtizing game design by offering reliable and 
accessible ways for everyone to make better games in less time. 

Both of these arguments assume that what we want to make is something like the 
games we have today. But PCG methods could also enable completely new types 
of games. To begin with, if we have Software that can generate game content at the 
speed it is being “consumed” (played), there is in principle no reason why games 
need to end. For everyone who has ever been disappointed by their favorite game 


* At least, this is true for “AAA” games, which are boxed games sold at full price worldwide. The 
recent rise of mobile games seems to have made single-person development feasible again, though 
average development costs are rising on that front too. 
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Fig. 4.1 Average cost breakdown estimate of AAA game development. Data source: 

http://monstervine.eom/2013/06/chasmg-an-industry-the-economics-of-videogames-tumed- 

hollywood/. 


not having any more levels to ciear, characters to meet, areas to explore, etc., this is 
an exciting prospect. 

Even more excitingly, the newly generated content can be tailored to the tastes 
and needs of the player playing the game. By combining PCG with player modeling, 
for example through measuring and using neural networks to model the response of 
players to individual game elements, we can create player-adaptive games that 
seek to maximize the enjoyment of players (see the roles of PCG in Section 


1 . 41 . 


The same techniques could be used to maximize the learning effects of a serious 
game, or perhaps the addictiveness of a “casual” game. 

Finally, a completely different but no less important reason for developing PCG 
methods is to understand design and creativity. Computer scientists are fond of 
saying that you do not really understand a process until you have implemented it in 
code (and the program runs). Creating Software that can competently generate game 
content could help us understand the process by which we “manually” generate 
the content, and clarify the affordances and constraints of the design problem we 
are addressing. This is an iterative process, whereby better PCG methods can lead 
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Fig. 4.2 An illustration of the PCG taxonomy discussed in Section|4.2| 


to better understanding of the design process, which in tum can lead to better PCG 
algorithms enabling a co-creativity process between designers and algorithms 07741 . 


4.2 Taxonomy 


With the variety of content generation problems, methods and approaches that are 
available, it helps to have a structure that can highlight the differences and similari- 
ties between them. In the following, we introduce a revised version of the taxonomy 
of PCG that was originally presented by Togelius et al. 07201 . It consists of a number 
of dimensions, where an individual method or solution should usually be thought of 
as lying somewhere on a continuum between the ends of that dimension. Beyond 


the taxonomy for the content types (Section 4.2.11 and the properties of the PCG 


methods, which we cover in Section |4J| we provide an outline of the roles a PCG 
algorithm can take, which we cover extensively in Section |4~^ 


4.2.1 Taxonomy for Content 

We identify the type of the generated outcome as the sole dimension that relates to 
content in this taxonomy (see also Fig.|4.2|l. 
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4.2.1.1 Type: Necessary Versus Optional 

PCG can be used to generate necessary game content that is required for the com- 
pletion of a level, or it can be used to generate optional content that can be discarded 
or exchanged for other content. The main distinguishing feature between necessary 
and optional content is that necessary content should always be correct while this 
condition does not hold for optional content. Necessary needs to be consumed or 
passed as the player makes their way through the game, whereas optional content 
can be avoided or bypassed. An example of optional content is the generation of dif¬ 
ferent types of weapons in first-person shooter games M240I or the auxiliary reward 
items in Super Mario Bros (Nintendo, 1985). Necessary content can be the main 
structure of the levels in Super Mario Bros (Nintendo, 1985), or the collection of 
certain items required to pass to the next level. 


4.2.2 Taxonomy for Methods 

PCG algorithms can be classified according to a number of properties such as their 
level of controllability, determinism and so on. This section outlines the three dimen- 
sions across which PCG methods can be classified. Figure |4~2] offers an illustration 
of the taxonomy that we discuss in this section. 


4.2.2.1 Determinism: Stochastic Versus Deterministic 

Our first distinction with regards to PCG methods concems the amount of random- 
ness in content generation. The right amount of variation in outcomes between dif¬ 
ferent runs of an algorithm with identical parameters is a design decision. Stochas- 
ticitj0 allows an algorithm (such as an evolutionary algorithm) to offer great vari¬ 
ation, necessary for many PCG tasks, at the cost of controllability. While content 
diversity and expressivity are desired properties of generators, the effect of random- 
ness on the final outcomes can only be observed and controlled after the fact. Com- 
pletely deterministic PCG algorithms, on the other hand, can be seen as a form of 
data compression. A good example of deterministic PCG is the first-person shooter 
.kkrieger (.theprodukkt 2004), which manages to compress all of its textures, ob- 
jects, music and levels together with its game engine in just 96 KB of storage space. 


^ Strictly speaking there is a distinction between a stochastic and a non-deterministic process in 
that the former has a dehned random distribution whereas the latter does not. For the purposes of 
this book, however, we use the two terms interchangeably. 
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4.2.2.2 Controllability: Controllable Versus Non-controllable 

The generation of content by PCG can be controlled in different ways. The use of 
a random seed is one way to gain control over the generation space; another way 
is to use a set of parameters that control the content generation along a number 
of dimensions. Random seeds were used when generating the world in Minecraft 
(Mojang, 2011), which means the same world can be regenerated if the same seed 
is used M755I . A vector of content features was used in 161711 to generate levels for 
Infinite Mario Bros (Persson, 2008) that satisfy a set of feature specifications. 


4.2.2.3 Iterativity; Constructive Versus Generate-and-Test 

A final distinction may be made between algorithms that can be called construc¬ 
tive and those that can be described as generate-and-test. A constructive algorithm 
generates the content once, and is done with it; however, it needs to make sure that 
the content is correct or at least “good enough” as it is being constructed. An ex- 
ample of this approach is using fractals or cellular automata to generate terrains 
or grammars to generate levels (also refer to the corresponding PCG method sec- 
tions below). A generate-and-test algorithm, instead, incorporates both a generate 
and a test mechanism. After a candidate content instance is generated, it is tested 
according to some criteria (e.g., is there a path between the entrance and exit of the 
dungeon, or does the tree have proportions within a certain range?). If the test fails, 
all or some of the candidate content is discarded and regenerated, and this process 
continues until the content is good enough. A popular PCG framework that builds 
upon the generate-and-test paradigm is the search-based 117201 approach discussed 
in Section l43] 


4.2.3 Taxonomy ofRoles 

In this section we identify and briefly outline the four possible roles a PCG algorithm 
can take in the game design process classified across the dimensions of autonomy 
and player-based adaptivity. The various PCG roles are illustrated in Fig. |4.2| and 
are extensively discussed in SectionH^ 


4.2.3.1 Autonomy: Autonomous Versus Mixed-Initiative 

The generative process that does not consider any input from the human designer is 
defined as autonomous PCG whereas mixed-initiative PCG refers to the process 
that involves the human designer in the Creative task. Both roles are discussed in 
further detail in Section iTAl 
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4.2.3.2 Adaptivity: Experience-Agnostic Versus Experience-Driven 

Experience-agnostic content generation refers to the paradigm of PCG where con- 
tent is generated without taking player behavior or player experience into account, 
as opposed to experience-driven 17831 . adaptive, personalized or player-centered 
content generation where player interaction with the game is analyzed and content 
is created based on a player’s previous behavior. Most commercial games tackle 
PCG in a generic, experience-agnostic way, while experience-driven PCG has been 
receiving increasing attention in academia. Recent extensive reviews of PCG for 
player-adaptive games can be found in II783117841 . 


4.3 How Could We Generate Content? 

There are many different algorithmic approaches to generating content for games. 
While many of these methods are commonly thought of as AI methods, others are 
drawn from graphics, theoretical computer Science or even biology. The various 
methods differ also in what types of content they are suitable to generate. In this sec- 
tion, we discuss a number of PCG methods that we consider important and include 
search-based, solver-based, and grammar-based methods but also cellular automata, 
noise and fractals. 


4.3.1 Search-Based Methods 

The search-based approach to PCG II720I has been intensively investigated in aca- 
demic PCG research in recent years. In search-based procedural content generation, 
an evolutionary algorithm or some other stochastic search or optimization algorithm 
is used to search for content with the desired qualities. The basic metaphor is that 
of design as a search process: a good enough solution to the design problem exists 
within some space of Solutions, and if we keep iterating and tweaking one or many 
possible Solutions, keeping those changes which make the solution(s) better and 
discarding those that are harmfui, we will eventually arrive at the desired solution. 
This metaphor has been used to describe the design process in many different disci- 
plines; for example, Will Wright—designer of SimCity (Electronic Arts, 1989) and 
The Sinis (Electronic Arts, 2000) —described the game design process as search in 
his talk at the 2005 Game Developers Conference. Others have previously described 
the design process in general, and in other specialized domains such as architecture, 
the design process can be conceptualized as search and implemented as a computer 
program 117571 [551. 

The core components of the search-based approach to solving a content genera¬ 
tion problem are the following: 
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• A search algorithm. This is the “engine” of a search-based method. Often rel- 
atively simple evolutionary algorithms work well enough, however sometimes 
there are substantial benefits to using more sopbisticated algoritbms tbat take 
e.g., constraints into account, or tbat are specialized for a particular content rep- 
resentation. 

• A content representation. Tbis is tbe representation of tbe artifacts you want to 
generate, e.g., levels, quests or winged kittens. Tbe content representation could 
be anytbing from an array of real numbers to a grapb to a string. Tbe content 
representation defines (and tbus also limits) wbat content can be generated, and 
determines wbetber effective searcb is possible; see also tbe discussion about 
representation in Cbapter]^ 

• One or more evaluation functions. An evaluation function is a function from an 
artifact (an individual piece of content) to a number indicating tbe quality of tbe 
artifact. Tbe output of an evaluation function could indicate e.g., tbe playabil- 
ity of a level, tbe intricacy of a quest or tbe aestbetic appeal of a winged kitten. 
Crafting an evaluation function tbat reliably measures tbe aspect of game qual¬ 
ity tbat it is meant to measure is often among tbe bardest tasks in developing a 
searcb-based PCG metbod. Refer also to tbe discussion about utility in Cbapter 

m 

Let us look at some of tbe cboices for content representations. To take a very 
well-known example, a level in Super Mario Bros (Nintendo, 1985) migbt be repre- 
sented in any of tbe following ways: 

1. Directly, as a level map, wbere eacb variable in tbe genotype corresponds to one 
“block” in tbe pbenotype (e.g., bricks, question mark blocks, etc.). 

2. More indirectly, as a list of tbe positions and properties of tbe different game 
entities sucb as enemies, platforms, gaps and bilis (an example of tbis can be 
found in 116111 1. 

3. Even more indirectly, as a repository of different reusable patterns (sucb as col- 
lections of coins or bilis), and a list of bow tbey are distributed (witb various 
transforms sucb as rotation and scaling) across tbe level map (an example of tbis 
can be found in 164911 1. 

4. Very indirectly, as a list of desirable properties sucb as tbe number of gaps, ene¬ 
mies, or coins, tbe widtb of gaps, etc. (an example of tbis can be found in 16171 1. 

5. Most indirectly, as a random number seed. 

Wbile it clearly makes no sense to evolve random number seeds (it is a represen¬ 
tation witb no locality wbatsoever) tbe otber levels of abstraction can all make sense 
under certain circumstances. Tbe fundamental tradeoff is between more direct, more 
fine-grained representations witb potentially bigber locality (bigber correlation of 
fitness between neigbboring points in searcb space) and less direct, more coarse- 
grained representations witb probably lower locality but smaller searcb spaces. 
Smaller searcb spaces are in general easier to searcb. However, larger searcb spaces 
would (all otber tbings being equal) allow for more different types of content to be 
expressed, or in otber words increase tbe expressive range of tbe generator. 
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The third important choice when designing a search-based PCG solution is the 
evaluation function known as the fitness function. The evaluation function assesses 
ali candidate Solutions, and assigns a score (a fitness value or evaluation value) to 
each candidate. This is essential for the search process; if we do not have a good 
evaluation function, the evolutionary process will not work as intended and will not 
find good content. It could be argued that designing an entirely accurate content 
quality evaluation is “Al-complete” in the sense that to really understand what a 
human finds fun, you actually have to be a human or understand humans in depth. 
However, like for other problems in various areas of AI, we can get pretty far with 
well-designed domain-specific heuristics. In general, the evaluation function shouid 
be designed to modei some desirable quality of the artifact, e.g., its playability, 
regularity, entertainment value, etc. The design of an evaluation function depends 
to a great extent on the designer and what she thinks are the important aspects that 
shouid be optimized and how to formulate that. 

In search-based PCG, we can distinguish between three classes of evaluation 
functions; direct, simulation-based, and interactive. 

• Direct evaluation functions map the generated content (or features extracted from 
it) directly to a content quality value and, in that sense, they base their fitness cal- 
culations directly on the phenotype representation of the content. No simulation 
of the gameplay is performed during the mapping. Some direct evaluation func¬ 
tions are hand-coded, and some are learned from data. Direct evaluation functions 
are fast to compute and often relatively easy to implement, but it is sometimes 
hard to devise a direct evaluation function for some aspects of game content. Ex- 
ample features include the placement of bases and resources in real-time strategy 
games I712L the size of the ruleset in strategy games ma or the current mood 
of the game scene based on visual attention theory I1185L The mapping between 
features and fitness might be contingent on a modei of the playing style, prefer- 
ences or affective state of players. An example of this form of fitness evaluation 
can be seen in the study done by Shaker et al. II617II6101 for personalizing player 
experience using models of players to give a measure of content quality. In that 
study, the authors trained neural networks to predict the player experience (such 
as challenge, frustration and enjoyment) of players given a playing style and 
the characteristics of a level as inputs. These trained neural networks can then 
be used as fitness functions by searching for levels that, for example, maximize 
predicted enjoyment while minimizing frustration. Or the exact opposite, if you 
wish. 

• Simulation-based evaluation functions use AI agents that play through the gen¬ 
erated content in order to estimate its quality. Statistics are calculated about the 
agents’ behavior and playing style and these statistics are then used to score game 
content. The type of the evaluation task determines the area of proficiency of the 
AI agent. If content is evaluated on the basis of playability, e.g., the existence of 
a path from the start to the end in a maze or a level in a 2D platform game, then 
AI agents shouid be designed that excel in reaching the end of the game. On the 
other hand, if content is optimized to maximize a particular player experience, 
then an AI agent that imitates human behavior shouid generally be adopted. An 
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(a) (b) 

Fig. 4.3 Examples of evolved weapons in Galactic Arms Races. Images obtained from 
www.aigameresearch.org. 


example study that implements a human-like agent for assessing content quality 
is presented in 11704 II where neural-network-based controllers are trained to drive 
like human players in a car racing game and then used to evaluate the generated 
tracks. Each track generated is given a fitness value according to statistics calcu- 
lated while the AI controller is playing. Another example of a simulation-based 
evaluation function is measuring the average fighting time of bots in a hrst-person 
shooter game lfT03l . In that study, levels were simply selected to maximize the 
amount of time bots spent on the level before killing each other. 

Interactive evaluation functions evaluate content based on interaction with a hu¬ 
man, so they require a human “in the loop”. Examples of this method can be 
found in the work by Hastings et al. II250L who implemented this approach by 
evaluating the quality of the personalized weapons evolved implicitly based on 


how often and how long the player chooses to use these weapons. Eigure 4.3 


presents two examples of evolved weapons for different players. Cardamone et 
al. III021 also used this form of evaluation to score racing tracks according to 
the users’ reported preferences. 01sted et al. used the same approach to design 
levels for hrst-person shooter games 05011 . The hrst case is an example of an 
implicit collection of data, i.e., that the players did not answer direct questions 
about their preferences, while players’ preferences were collected explicitly in 
the second. The problem with explicit data collection is that, if not well inte- 
grated, it requires the gameplay session to be interrupted. This method, however, 
provides a reliable and accurate estimator of player experience, as opposed to 
implicit data collection, which is usually noisy and based on assumptions. Hy- 
brid approaches are sometimes employed to mitigate the drawbacks of these two 
methods by collecting information across multiple modalities such as combining 
player behavior with eye gaze and/or skin conductance. Example applications 
of this approach can be found in biofeedback-based camera viewpoint genera- 
tion in, level generation IMol and visuals generation in physical interactive 
games iml. 
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Search-based methods have extremely broad applicability, as evolutionary com- 
putation can be used to construet almost any kind of game content. However, tbis 
generality comes witb a number of drawbacks. One is tbat it is generally a ratber 
slow metbod, requiring tbe evaluation of a large number of candidate content items. 
Tbe time required to evolve a good solution can also not be precisely predicted, as 
tbere is no runtime guarantee for evolutionary algoritbms. Tbis migbt make searcb- 
based PCG Solutions unsuitable for time-critical content generation, sucb as wben 
you only bave a few seconds to serve up a new level in a game. It sbould also be 
noted tbat tbe successful application of searcb-based PCG metbods relies on judi- 
cious design cboices wben it comes to tbe particular searcb algoritbm, representation 
and evaluation function. 

As we will see below, tbere are several algoritbms tbat are generally better suited 
tban evolutionary algoritbms for generating game content of specific types. How¬ 
ever, none of tbem bave tbe versatility of tbe searcb-based approacb, witb abilities 
to incorporate all kinds of objectives and constraints. As we will see, many otber al¬ 
goritbms for content generation can also be combined witb searcb-based Solutions, 
so tbat evolutionary algoritbms can be used to searcb tbe parameter space of otber 
algoritbms. 


4.3.2 Solver-Based Methods 

Wbile tbe searcb-based approacb to content generation means using one or several 
objective functions, perbaps in conjunction witb constraints, to specify for a ran- 
domized searcb algoritbm sucb as an evolutionary algoritbm wbat to look for, tbere 
is anotber approacb to PCG wbicb is also based on tbe idea of content generation as 
searcb in a space of artifaets. Solver-based methods for PCG use constraint solvers, 
sucb as tbose used in logic programming, to searcb for content artifaets tbat satisfy 
a number of constraints. 

Constraint solvers allow you to specify a number of constraints in a specific 
language; some solvers require you to specify tbe constraints matbematically, otbers 
use a logic language for specification. Bebind tbe scenes tbey can be implemented 
in many different ways. Wbile tbere are some solvers tbat are based on evolutionary 
computation, it is more common to use specialized metbods, sucb as reducing tbe 
problem to a SAT (satisfiability) problem and using a SAT solver to find Solutions. 
Many sucb solvers progress not tbrougb evaluating wbole Solutions, but searcbing in 
spaces of partial Solutions. Tbis bas tbe effect of iteratively pruning tbe searcb space; 
eliminating parts of tbe searcb space repeatedly until only viable Solutions are left. 
Tbis marks a sbarp difference witb tbe searcb-based paradigms, and also suggests 
some differences in tbe use cases for tbese classes of algoritbms. For example, wbile 
evolutionary algoritbms are anytime algorithms, i.e., tbey can be stopped at any 
point and some kind of solution will be available (tbougb a better solution would 
probably bave been available if you bad tbe algoritbm run for longer), tbis is not 
always tbe case witb constraint satisfaction metbods—tbougb tbe time taken until a 
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viable solution is found can be very low, depending on how many constraints need to 
be satisfied. Unlike with evolutionary algorithms, it is possible to prove bounds on 
the worst-case complexity of SAT solvers and the algorithms that depend on them. 
However, while the worst-case complexity is often shockingly high, such algorithms 
can be very fast when applied (judiciously) in practice. 

An example of solver-based methods is the level generator for Smith and White- 
head’s Tanagra mixed-initiative platform level design tool 164211 . At the core of the 
tool is a constraint-based generator of platform game levels. The constraint solver 
uses a number of constraints on what constitutes a solvable platform level (e.g., 
maximum jump length, and distance from jumps to enemies) as well as constraints 
based on aesthetic considerations, such as the “rhythm” of the level, to generate 
new platform game levels or level segments. This generation happens very fast and 
produces good results, but is limited to fairly linear levels. Another example is the 
Work by El-Nasr et al. on procedural generation of lighting 0188111851 . The system 
developed in those studies configures and continuously modulates the lighting in 
a scene with the aim to enhance player experience by using constraint non-linear 
optimization to select the best lighting configuration. 

One approach to constraint solving which can be particularly useful for PCG be- 
cause of its generality is Answer Set Programming (ASP) 063 81 . ASP builds on 
AnsProlog, a constraint programming language which is similar to Prolog 0691 . 
Complex sets of constraints can be specified in AnsProlog, and an ASP solver can 
then be used to find all models (all configurations of variables) that satisfy the con¬ 
straints. For PCG purposes, the model (the set of parameters) can be used to describe 
a World, a story or similar, and the constraints can specify playability or various aes¬ 
thetic considerations. An example of an ASP application for level and puzzle gener¬ 
ation is the Refraction game (see Fig. 4.4 1 . For a good introduction to and overview 
of the use of ASP in PCG you may refer to 11638114851 . 

Generally solver-based methods can be suitable when the whole problem can 
be encoded in the language of the constraint solver (such as AnsProlog). It is gen¬ 
erally complicated to include simulation-based tests or any other call to the game 
engine inside a constraint satisfaction program. An alternative if simulation-based 
tests need to be performed is evolutionary algorithms, which can also be used to 
solve constraint satisfaction problems. This allows a combination of htness values 
and constraints to drive evolution 137611382112401 . 


4.3.3 Grammar-Based Methods 

Grammars are fundamental structures in computer Science that also have many ap- 
plications in procedural content generation. In particular, they are very frequently 
used for producing plants, such as trees, which are commonly used in many differ¬ 
ent types of games. However, grammars have also been used to generate missions 
and dungeons I173II1741 . rocks 11591 . underwater environments O and caves isi. 
In these cases, grammars are used as a constructive method, creating content with- 
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Fig. 4.4 A screenshot from the Refraction educational game. A solver-based PCG method (ASP) 
is used to generate the levels and puzzles of the game. Further details about the application of ASP 
to Refraction can be found in I635II89I . 


out any evaluation functions or re-generation. However, grammar methods can also 
be used together with search-based methods, so that the grammar expansion is used 
a genotype-to-phenotype mapping. 

A (formal) grammar is a set of production rules for rewriting strings, i.e., turn- 
ing one string into another. Each rule is of the form (symbol(s)) —^ (other symbol(s)). 
Here are some example production rules: 

1. A ^ AB 

2. B^b 

Expanding a grammar is as simple as going through a string, and each time a Symbol 
or sequence of symbols that occurs in the left-hand side of a rule is found, those 
symbols are replaced by the right-hand side of that rule. Eor example, if the initial 
string is A, in the first rewriting step the A would be replaced by AB by rule 1, and 
the resulting string will be AB. In the second rewriting step, the A would again be 
transformed to AB and the B would be transformed to b using rule 2, resulting in 
the string ABb. The third step yields the string ABbb and so on. A convention in 
grammars is that upper-case characters are nonterminal symbols, which are on the 
left-hand side of rules and therefore rewritten further, whereas lower-case characters 
are terminal symbols which are not rewritten further. 
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Starting with the axiom A (in L-systems the seed strings are called axioms) the 
first few expansions look as follows: 

A 

AB 

ABA 

ABAAB 

ABAABABA 

ABAABABAABAAB 

ABAABABAABAABABAABABA 

ABAABABAABAABABAABABAABAABABAABAAB 

This particular grammar is an example of an L-system. L-systems are a class of 
grammars whose defining feature is parallel rewriting, and was introduced by the 
biologist Aristid Lindenmayer in 1968 explicitly to model the growth of organic 
Systems such as plants and algae Gm. With time, they have turned out to be very 
useful for generating plants in games as well as in theoretical biology. 

One way of using the power of L-systems to generate 2D (and 3D) artifacts is to 
interpret the generated strings as instructions for a turtle in turtle graphics. Think 
of the turtle as moving across a plane holding a pencil, and simply drawing a line 
that traces its path. We can give commands to the turtle to move forwards, or to turn 
left or right. For example, we can define the L-system alphabet {F, +, -, [, ]} and 
then use the following key to interpret the generated strings; 

• F: move forward a certain distance (e.g., 10 pixels). 

• -f: turn left 30 degrees. 

• —: turn right 30 degrees. 

• [; push the current position and orientation onto the stack. 

• ]: pop the position and orientation off the stack. 

Bracketed L-systems can be used to generate surprisingly plant-like structures. 
Consider the L-system defined by the single rule F —>■ F[—F]F'[-|-F'] [F]. Figure |43] 
shows the graphical interpretation of the L-system after 1, 2, 3 and 4 rewrites start¬ 
ing from the single Symbol F. Minor variations of the rule in this system generate 
different but stili plant-like structures, and the general principle can easily be ex- 
tended to three dimensions by introducing symbols that represent rotation along the 
axis of drawing. For this reason, many Standard packages for generating plants in 
game worlds are based on L-systems or similar grammars. For a multitude of beau- 
tiful examples of plants generated by L-systems refer to the book The Algorithmic 
Beauty of Plants by Prusinkiewicz and Lindenmayer 054211 . 

There are many extensions of the basic L-system formalism, including non- 
deterministic L-systems, which can help with increasing diversity of the generated 
content, and context-sensitive L-systems, which can produce more complicated pat- 
terns. Formally specifying L-systems can be a daunting task, in particular as the 
mapping between the axiom and rules on the one hand and the results after expan- 
sion on the other are so complex. However, search-based methods can be used to 
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Fig. 4.5 Four rewrites of the bracketed L-system F —> F[—F]F[+F][F]. 

find good axioms or mles, using for example desired height or complexity of the 
piant in the evaluation function M498L 


4.3.4 Cellular Automata 

A cellular automaton (plural; cellular automata) is a discrete model of computa- 
tion which is widely studied in computer Science, physics, complexity Science and 
even some branches of biology, and can be used to computationally model biologi- 
cal and physical phenomena such as growth, development, patterns, forms, or even 
emergence. While cellular automata (CA) have been the subject of extensive study, 
the basic concept is actually very simple and can be explained in a sentence or two: 
cellular automata are a set of cells placed on a grid that change through a number of 
discrete time steps according to a set of rules; these rules rely on the current state of 
each cell and the state of its neighboring cells. The rules can be applied iteratively 
for as many time steps as we desire. The conceptual idea behind cellular automata 
was introduced by Stanislaw Ulam and John von Neumann 0742114871 back in the 
1940s; it took about 30 more years, however, for us to see an application of cel¬ 
lular automata that showed their potential beyond basic research. That application 
was the two-dimensional cellular automaton designed in Conway’s Game of Life 
11341 . The Game of Life is a zero-player game; its outcome is not influenced by 
the player’ s input throughout the game and it is solely dependent on its initial state 
(which is determined by the player). 
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A cellular automaton contains a number of cells represented in any number of 
dimensions; most cellular automata, however, are either one-dimensional (vectors) 
or two-dimensional (matrices). Each cell can have a finite number of States; for in- 
stance, the cell can be on or ojf. A set of cells surrounding each cell define its neigh- 
borhood. The neighborhood defines which cells around a particular cell affect the 
celFs future state and its size can be represented by any integer number greater than 
1. For one-dimensional cellular automata, for instance, the neighborhood is defined 
by the number of cells to the left or the right of the cell. For two-dimensional cellu¬ 
lar automata, the two most common neighborhood types are the Moore and the von 
Neumann neighborhood. The former neighborhood type is a square consisting of 
the cells surrounding a cell, including those surrounding it diagonally; for example, 
a Moore neighborhood of size 1 contains the eight cells surrounding each cell. The 
latter neighborhood type, instead, forms a cross of cells which are centered on the 
cell considered. For example, a von Neumann neighborhood of size 1 consists of 
the four cells surrounding the cell, above, below, to the left and to the right. 

At the beginning of an experiment (at time f = 0) we initialize the cells by as- 
signing a state for each one of them. At each time step t we create a new generation 
of cells according to a rule or a mathematical function which specifies the new state 
of each cell given the current state of the cell and the States of the cells in its neigh¬ 
borhood at time f — 1. Normally, the rule for updating the state of the cells remains 
the same across all cells and time steps (i.e., it is static) and is applied to the whole 
grid. 

Cellular automata have been used extensively in games for modeling environ- 
mental systems like heat and fire, rain and fluid flow, pressure and explosions 
II209I I676II and in combination with influence maps for agent decision making 
11678116771 . Another use for cellular automata has been for thermal and hydraulic 
erosion in procedural terrain generation 15001. Of particular interest for the pur- 
poses of this section is the work of Johnson et al. l304l on the generation of infinite 
cave-like dungeons using cellular automata. The motivation in that study was to cre¬ 
ate an infinite cave-crawling game, with environments stretching out endlessly and 
seamlessly in every direction. An additional design constraint was that the caves are 
supposed to look organic or eroded, rather than having straight edges and angles. No 
storage medium is large enough to store a truly endless cave, so the content must 
be generated at runtime, as players choose to explore new areas. The game does not 
scroll but instead presents the environment one screen at a time, which offers a time 
window of a few hundred milliseconds in which to create a new room every time 
the player exits a room. 

The method introduced by Johnson et al. 11304 II used the following four parame- 
ters to control the map generation process; 

• a percentage of rock cells (inaccessible areas) at the beginning of the process; 

• the number of CA generations (iterations); 

• a neighborhood threshold value that defines a rock; 

• the Moore neighborhood size. 
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(b) A map generated with cellular automata 


Fig. 4.6 Cave generationi Comparison between a CA and a randomly generated map. The CA pa- 
rameters used are as follows: the CA runs for four generations; the size of the Moore neighborhood 
considered is 1; the threshold value for the CA rule is 5 (T = 5); and the percentage of rock cells 
at the beginning of the process is 50% (for both maps). Rock and wall cells are represented by 
white and red color respectively. Colored areas represent different tunnels (floor clusters). Images 
adapted from l304l . 


In the dungeon generation implementation presented in II304L each room is a 
50 X 50 grid, where each cell can be in one of two States: empty or rock. Initially, 
the grid is empty. The generation of a single room is as follows. 

1. The grid is “sprinkled” with rocks: for each cell, there is probability (e.g., 0.5) 
that it is turned into rock. This results in a relatively uniform distribution of rock 
cells. 

2. A number of CA generations (iterations) are applied to this initial grid. 

3. For each generation the following simple rule is applied to ali cells: a cell tums 
into rock in the next iteration if at least T (e.g., 5) of its neighbors are rock, 
otherwise it will tum into free space. 

4. For aesthetic reasons the rock cells that border on empty space are designated as 
“wall” cells, which are functionally rock cells but look different. 

The aforementioned simple procedure generates a surprisingly lifelike cave- 
room. Figure [43| shows a comparison between a random map (sprinkled with rocks) 
and the results of a few iterations of the cellular automaton. But while this process 
generates a single room, a game would normally require a number of connected 
rooms. A generated room might not have any openings in the confining rocks, and 
there is no guarantee that any exits align with entrances to the adjacent rooms. 
Therefore, whenever a room is generated, its immediate neighbors are also gen¬ 
erated. If there is no connection between the largest empty spaces in the two rooms, 
a tunnel is drilled between those areas at the point where they are least separated. A 
few more iterations of the CA algorithm are then run on ali nine neighboring rooms 
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Fig. 4.7 Cave generation: a 3 x 3 base grid map generated with CA. Rock and wall cells are 
represented by white and red color respectively; gray areas represent floor. Moore neighborhood 
size is 2, T is 13. number of CA iterations is 4, and the percentage of rock cells at the initialization 
phase is 50%. Image adapted from 1^ . 


together, to smooth out any sharp edges. Figure |477] shows the resuit of this process, 
in the form of nine rooms that seamlessly connect. This generation process is ex- 
tremely fast, and can generate all nine rooms in less than a millisecond on a modern 
computer. A similar approach to that of Johnson et al. is featured in the Galak-Z 
(17-bit, 2016) game for dungeon generation M- In that game cellular automata gen¬ 
erate the individual rooms of levels and the rooms are tied together via a variation 
of a Hilbert curve, which is a continuous fractal space-filling curve M261I . Galak-Z 
(17-bit, 2016) shows an altemative way of combining CA with other methods for 
achieving the desired map generation resuit. 

In summary, CA are very fast constructive methods that can be used effectively 
to generate certain kinds of content such as terrains and levels (as e.g., in 03041 1. 
but they can also be potentially used to generate other types of content. The greatest 
benefits a CA algorithm can offer to a game content generator is that it depends on 
a small number of parameters and that it is intuitive and relatively simple to grasp 
and implement. However, the algorithm’s constructive nature is the main cause for 
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its disadvantages. For both designers and programmers, it is not trivial to fully un- 
derstand the impact that a single parameter may have on the generation process, 
since each parameter affects multiple features of the generated output. While the 
few parameters of the core algorithm allow for a certain degree of controllability, 
the algorithm cannot guarantee properties such as playability or solvability of lev- 
els. Further, it is not possible to design content that has specific requirements, e.g., a 
map with a certain connectivity, since gameplay features are disjoint from the con- 
trol parameters of the CA. Thus, any link between the CA generation method and 
gameplay features would have to be created through a process of trial and error. 
In other words, one would need to resort to preprocessing or a generate-and-test 
approach. 


4.3.5 Noise and Fractals 

One class of algorithms that are very frequently used to generate heightmaps and 
textures are noise algorithms, many of which are fractal algorithms, meaning that 
they exhibit scale-invariant properties. Noise algorithms are usually fast and easy to 
use but lack in controllability. 

Both textures and many aspects of terrains can fruitfully be represented as two- 
dimensional matrices of real numbers. The width and height of the matrix map to 
the X and y dimensions of a rectangular surface. In the case of a texture, this is called 
an intensity map, and the values of cells cotrespond directiy to the brightness of the 
associated pixels. In the case of terrains, the value of each cell corresponds to the 
height of the terrain (over some baseline) at that point. This is called a heightmap. If 
the resolution with which the terrain is rendered is greater than the resolution of the 
heightmap, intermediate points on the ground can simply be interpolated between 
points that do have specified height values. Thus, using this common representation, 
any technique used to generate noise could also be used to generate terrains, and vice 
versa—though they might not be equally suitable. 

It shouid be noted that in the case of terrains, other representations are possible 
and occasionally suitable or even necessary. For example, one could represent the 
terrain in three dimensions, by dividing the space up into voxels (cubes) and comput- 
ing the three-dimensional voxei grid. An example is the popular open-world game 
Minecraft (Mojang, 2011), which uses unusually large voxels. Voxei grids allow 
structures that cannot be represented with heightmaps, such as caves and overhang- 
ing cliffs, but they require a much larger amount of storage. 

Fractals 0180115001 such as midpoint displacement algorithms 0^ are in com¬ 
mon use for real-time map generation. Midpoint displacement is a simple algorithm 
for generating two-dimensional landscapes (seen from the side) by repeatedly sub- 
dividing a line. The procedure is as follows: start with a horizontal line. Find the 
midpoint of that line, and move the line up or down by a random amount, thus 
breaking the line in two. Now do the same thing for both of the resulting lines, 
and so on for as many steps as you need in order to reach sufficient resolution. Ev- 
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Generation 0 



Midpoint displacement 


Generation 1 




Generation 2 


Generation 3 



Final Generation 


Fig. 4.8 The Midpoint Displacement algorithm visualized. 


ery time you call the algorithm recursively, lower the range of the random number 
generator somewhat (see Fig. |4.8| for an example). 

A useful and simple way of extending the midpoint displacement idea to two di- 
mensions (and thus creating two-dimensional heightmaps which can be interpreted 
as three-dimensional landscapes) is the Diamond-Square algorithm (also known 
as “the cloud fractal” or “the plasma fractal” because of its frequent use for creating 
such effects) II210I . This algorithm uses a square 2D matrix with width and height 
2” + 1. To run the algorithm you normally initialize the matrix by setting the values 
of all cells to 0, except the four corner values which are set to random values in 
some chosen range (e.g., [—1,1]). Then you perform the following steps: 


1. Diamond step: Find the midpoint of the four corners, i.e., the most Central 
cell in the matrix. Set the value of that cell to the average value of the cor¬ 
ners. Add a random value to the middle cell. 
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2. Square step: Find the four cells in between the corners. Set each of those to 
the average value of the two corners surrounding it. Add a random value to 
each of these cells. 


Call this method recursively for each of the four subsquares of the matrix, until 
you reach the resolution limit of the matrix (3*3 sub-squares). Every time you 
call the method, reduce the range of the random values somewhat. The process is 


illustrated in Fig. 4.9 


There are many more advanced methods for generating fractal noise, with differ¬ 
ent properties. One of the most important is Perlin noise, which has some benefits 
over Diamond Square M529II . These algorithms are covered thoroughly in books that 
focus on texturing and modeling from a graphics perspective 018011 . 


4.3.6 Machine Learning 

An emerging direction in PCG research is to train generators on existing content, 
to be able to produce more content of the same type and style. This is inspired 
by the recent results in deep neural networks, where network architectures such 
as generative adversarial networks 02321 and variational autoencoders 03421 
have attained good results in learning to produce images of e.g., bedrooms, cats or 
faces, and also by earlier results where both simpler learning mechanisms such as 
Markov chains and more complex architectures such as recurrent neural networks 
have learned to produce text and music after training on some corpus. 

While these kinds of generative methods based on machine learning work well 
for some types of content—most notably music and images—many types of game 
content pose additional challenges. In particular, a key difference between game 
content generation and procedural generation in many other domains is that most 
game content has striet structural constraints to ensure playability. These constraints 
differ from the structural constraints of text or music because of the need to play 
games in order to experience them. A level that structurally prevents players from 
finishing it is not a good level, even if it is visually attractive; a strategy game map 
with a strategy-breaking shorteut will not be played even if it has interesting fea- 
tures; a game-breaking card in a collectible card game is merely a curiosity; and so 
on. Thus, the domain of game content generation poses different challenges from 
that of other generative domains. The same methods that can produce “mostly cor- 
rect” images of bedrooms and horses, that might stili have a few impossible angles 
or vestigial legs, are less suitable for generating mazes which must have an exit. 
This is one of the reasons why machine learning-based approaches have so far only 
attained limited success in PCG for games. The other main reason is that for many 
types of game content, there simply isn’t enough existing content to train on. This 
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Fig. 4.9 The Diamond-Square algorithm visualized in flve steps. Adapted from a flgure by Christo- 
pher Ewin, licensed under CC BY-SA 4.0. 


is, however, an active research direction where much progress might be achieved in 
the next few years. 
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(a) « = 1 



(b) n = 2 



(c) « = 3 


Fig. 4.10 Mario levels reconstmcted by «-grams with n set to 1, 2, and 3, respectively. 


The core difference between PCG via machine learning and approaches such as 
search-based PCG is that the content is created directly (e.g., via sampling) from 
models which have been trained on game content. While some search-based PCG 
approaches use evaluation functions that have been trained on game content—for 
instance, the work of Shaker et al. 16211 or Liapis et al. 13731 —the actual content 
generation is stili based on search. Below, we present some examples of PCG via 
machine learning; these particular PCG studies built on the use of n-grams, Markov 
models and artificial neural networks. For more examples of early work along 
these lines, see the recent survey paper 06681 . 


4.3.6.1 n-grams and Markov Models 

For content that can be expressed as one- or two-dimensional discrete structures, 
such as many game levels, methods based on Markov models can be used. One par- 
ticularly straightforward Markov model is the «-gram model, which is commonly 
used for text prediction. The n-gram method is very simple—essentially, you build 
conditional probability tables from strings and sample from these tables when con- 
structing new strings—and also very fast. 

Dahlskog et al. trained n-gram models on the levels of the original Super Mario 
Bros (Nintendo, 1985) game, and used these models to generate new levels 01561 . 
As n-gram models are fundamentally one-dimensional, these levels needed to be 
converted to strings in order for n-grams to be applicable. This was done through di- 
viding the levels into vertical “slices,” where most slices recur many times through- 
out the level 01551 . This representational trick is dependent on there being a large 
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amount of redundancy in the level design, something that is true in many games. 
Models were trained using various levels of n, and it was observed that while n = 0 
creates essentially random structures and n — \ creates barely playable levels, n = 2 
and n — h create rather well-shaped levels. See Fig. 4.10 for examples of this. 

Summerville et al. ESI extended these models with the use of Monte Carlo 
tree search to guide the generation. Instead of solely relying on the learned condi- 
tional probabilities, they used the learned probabilities during rollouts (generation 
of whole levels) that were then scored based on an objective function specified by 
a designer (e.g., allowing them to bias the generation towards more or less diffi- 
cult levels). The generated levels could stili only come from observed configura- 
tions, but the utilization of MCTS meant that playability guarantees could be made 
and allowed for more designer control than just editing of the input corpus. This 
can be seen as a hybrid between a search-based method and a machine learning- 
based method. In parallel, Snodgrass and Ontanon trained two-dimensional Markov 
Chains—a more complex relative of the n-gram—to generate levels for both Su¬ 
per Mario Bros (Nintendo, 1985) and other similar platform games, such as Lode 
Runner (Brpderbund, 1983) 16441 . 


4.3.6.2 Neural Networks 

Given the many uses of neural networks in machine leaming, and the many dif¬ 
ferent neural network architectures, it is little wonder that neural networks are also 
highly useful for machine learning-based PCG. Following on from the Super Mario 
Bros (Nintendo, 1985) examples in the previous section, Hoover et al. 12771 gen¬ 
erated levels for that same game by extending a representation called functional 
scaffolding for musical composition (FSMC) that was originally developed to com- 
pose music. The original FSMC representation posits 1) music can be represented 
as a function of time and 2) musical voices in a given piece are functionally re- 
lated ESI. Through a method for evolving neural networks called NeuroEvolu- 
tion of Augmenting Topologies II655I . additional musical voices are evolved to be 
played simultaneously with an original human-composed voice. To extend this mu¬ 
sical metaphor and represent Super Mario Bros (Nintendo, 1985) levels as functions 
of time, each level is broken down into a sequence of tile-width columns. The height 
of each column extends the height of the screen. While FSMC represents a unit of 
time by the length of an eighth-note, a unit of time in this approach is the width of 
each column. At each unit of time, the system queries the ANN to decide a height to 
place a tile. FSMC then inputs a musical note’s pitch and duration to the ANNs. This 
approach translates pitch to the height at which a tile is placed and duration to the 
number of times a tile-type is repeated at that height. For a given tile-type or musical 
voice, this information is then fed to a neural network that is trained on two-thirds 
of the existing human-authored levels to predict the value of a tile-type at each coT 
umn. The idea is that the neural network will learn hidden relationships between the 
tile-types in the human-authored levels that can then help humans construet entire 
levels from as little starting information as the layout of a single tile-type. 
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Of course, machine learning can also be used to generate other types of game 
content that are not levels. A fascinating example of this is Mystical Tutor, a design 
assistant for Magic: The Gathering cards II666I . In contrast to some of the other 
generators that aim to produce complete, playable levels, Mystical Tutor acknowl- 
edges that its output is likely to be flawed in some ways and instead aims to provide 
inspirational raw material for card designers. 


4.4 Roles of PCG in Games 


The generation of content algorithmically may take different roles within the do- 
main of games. We can identify two axes across which PCG roles can be placed; 
players and designers. We envision PCG systems that consider designers while they 
generate content or they operate interdependently of designers; the same applies for 
players. Figure |4~TT| visualizes the key roles of PCG in games across the dimensions 
of designer initiative and player experience. 

Regardless of the generation method used, game genre or content type PCG can 
act either autonomously or as a collaborator in the design process. We refer to the 
former role as autonomous generation (Section [4.4.2[ ) and the latter role as mixed- 
initiative (Section 4.4. l| i generation. Further, we cover the experience-driven PCG 
role by which PCG algorithms consider the player experience in whatever they try 
to generate (Section 4.4. 3| l. As a resuit, the generated content is associated to the 
player and her experience. Finally, if PCG does not consider the player as part of 


the generation process it becomes experience-agnostic (Section 4.4.4 1 . 

PCG techniques can be used to generate content in runtime, as the player is 
playing the game, allowing the generation of endless variations, making the game 
inhnitely replayable and opening the possibility of generating player-adapted con¬ 
tent, or offline during the development of the game or before the start of a game 
session. The use of PCG for offline content generation is particularly useful when 
generating complex content such as environments and maps; several examples of 
that was discussed at the beginning of the chapter. An example of the use of run¬ 
time content generation can be found in the game Left 4 Dead (Valve, 2008), a 
flrst-person shooter game that provides dynamic experience for each player by an- 
alyzing player behavior on the fly and altering the game state accordingly using 
PCG techniques iflTll^ . A trend related to runtime content generation is the cre- 
ation and sharing of player-generated content. Some games such as LittleBigPlanet 
(Sony Computer Entertainment, 2008) and Spore (Electronic Arts, 2008) provide 
a content editor (level editor in the case of LittleBigPlanet and the Spore Creature 
Creator) that allows the players to edit and upload complete creatures or levels to 
a Central Online server where they can be downloaded and used by other players. 
With respect to the four different roles of PCG in games, runtime generation is pos- 
sible in the autonomous and experience-agnostic roles, it is always the case in the 
experience-driven role whereas it is impossible in the mixed-initiative role. On the 
other hand, the offline generation of content can occur both autonomously and in 
















176 


Chapter 4. Generating Conterit 


Player 

(Experience) 



Sonanda (Lopes et al., 
2015) 


(Pedersen et al., 2010) , .. ^ ci 4 . uu 1 

' ' Sentient Sketchbook 


Super Mario Bros 


(Liapis et al., 2013) 


X 

LU 



Garden ofEden Creation 
Kit (Bethesda, 2009) 


SpeedTree (IDV, 2002) 


StarCraft Maps 
(Togelius et al., 2013) 


Ropossum (Shaker et al., 
2013) 


Designer 

(Initiative) 


LLI 


Tanagra (Smith et al., 
2010 ) 


Autonomous Mixed-lnitiative 


Fig. 4.11 The four key PCG roles in games across the dimensions of designer initiative and player 
experience. For each combination of roles the figure lists a number of indicative examples of tools 
or studies covered in this chapter. 


an experience-agnostic manner. Further it is exclusively the only way to generate 
content in a mixed-initiative fashion whereas it is not relevant for experience-driven 
generation as this PCG role occurs in runtime by definition. 

The following four subsections outline the characteristics of each of the four PCG 
roles with a particular emphasis on the mixed-initiative and the experience-driven 
roles that have not yet covered in length in this chapter. 


4.4.1 Mixed-initiative 

Al-assisted game design refers to the use of Al-powered tools to support the game 
design and development process. This is perhaps the AI research area which is most 
promising for the development of better games II764I . In particular, AI can assist in 
the creation of game content varying from levels and maps to game mechanics and 
narratives. 
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Fig. 4.12 The mixed-initiative spectrum between human and computer initiative (or creativity) 
across a number of mixed-initiative design tools discussed in this section. Iconoscope is a mixed- 
initiative drawing game |J721 . Sentient Sketchbook is a level editor for strategy games 13791 . Sen¬ 
tient World is mixed-initiative map editor (380) and Spaceship Design is a mixed-initiative (mostly 
computer initiative) tool powered by interactive evolution 1^ . Adapted from 1^ . 


We identify Al-assisted game design as the task of creating artifacts via the in- 
teraction of a human initiative and a computational initiative. The computational 
initiative is a PCG process and thus, we discuss this co-design approach under the 
heading of procedural content generation. Although the term mixed-initiative lacks 
a concrete definition II497L in this book we define it as the process that considers 
both the human and the computer proactively making content conttibutions to the 
game design task although the two initiatives do not need to contribute to the same 
degree II774L Mixed-initiative PCG thus differs from other forms of co-creation, 
such as the collaboration of multiple human creators or the interaction between a 
human and non-proactive computer support tools (e.g., spell-checkers or image ed- 
itors) or non-computer support tools (e.g., artboards or idea cards). The initiative of 
the computer can be seen as a continuum between the no initiative state, leaving full 
control of the design to the designer and having the computer program simply carry 
out the commands of the human designer, to the full initiative state which yields an 
autonomously Creative system. Any state between the two is also possible as we will 
see in the examples below and as depicted in Fig. |4.12| 


4.4.1.1 Game Domains 

While the process of Al-assisted game design is applicable to any Creative facets 
within game design 0811 it is level design that has benefited the most from it. 
Within commercial-standard game development, we can find Al-based tools that 
allow varying degrees of computer initiative. On one end of the scale, level editors 
such as the Garden of Eden Creation Kit (Bethesda, 2009) or game engines such 
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as the Unreal Development Kit (Epie Games 2009) leave most of Creative process 
to the designer but they, nevertheless, boost game development through automat- 
ing interpolations, pathbnding and rendering 17741 . On the other end of the com¬ 
puter initiative scale, PCG tools specialized on e.g., vegetation— SpeedTree (IDV, 
2002)—or EPS levels— Oblige (Apted, 2007)—only require the designer to set a 
small amount of generation parameters and, thus, the generation process is almost 
entirely autonomous. 

Within academia, the area of Al-assisted game design tools has seen signiheant 
research interest in recent years M785II with contributions mainly to the level de¬ 
sign task across several game genres including platformers IMD, strategy games 
113801137911774113781 (see Eig. 4.13(a)|, open world games 06341 . racing games 
01020 . casual puzzle games 16141 (see Eig. 4.13(b)|l, horror games 03941 . hrst-person 
shooters IMl, educational games ll89l 13721 . mobile games 04821 . and adventure 
games 03230 . The range of mixed-initiative game design tools expands to tools that 
are designed to assist with the generation of complete game rulesets such as the 
MetaGame 05221 . the RuLearn 06991 and the Ludocore 06391 tools to tools that are 
purposed to generate narratives 0480116731 and stories within games 03581 . 


4.4.1.2 Methods 

Any PCG approach could potentially be utilized for mixed-initiative game design. 
The dominant methods that have so far being used, however, rely on evolutionary 
computation following the search-based PCG paradigm. Even though evolution, at 
fist sight, does not appear to be the most attractive approach for real-time processing 
and generation of content, it offers great advantages associated, in particular, with 
the stochasticity of artihcial evolution, diversity maintenance and potential for bal- 
ancing multiple design objectives. Evolution can be constrained to the generation of 
playable, usable, or, in general, content of particular qualities within desired design 
speciheations. At the same time, it can incorporate metries such as novelty 03820 or 
surprise Il240l . for maximizing the diversity of generated content and thus enabling a 
change in the Creative path of the designer 07741 . Evolution can be computationally 
costly, however, and thus, interactive evolution is a viable and popular alternative 
for mixed-initiative evolutionary-based generation (e.g., see 01021138011377115011 '). 

Beyond artihcial evolution, another class of algorithms that is relevant for mixed- 
initiative content generation is constraint solvers and constraint optimization. Meth¬ 
ods such as answer set programming 0383ll69ll have been used in several Al-assisted 
level design tools including Tanagra HMD for platformers and Refraction 18^ for 
educational puzzle games. Artihcial neural networks can also perform certain tasks 
in a mixed-initiative manner, such as performing “autocomplete” or level repair 
02960 through the use of deep learning approachers such as stacked autoencoders. 
The goal here is to provide a tool that “hlls in” parts of a level that the human de¬ 
signer does not want or have time to create, and correcting other parts to achieve 
further consistency. 
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Fig. 4.13 Examples of mixed-initiative level design tools. Sentient Sketchbook (a) offers map 
sketch suggestions to the designer via artificial evolution (see rightmost part of the image); the 
suggestions are evolved to either maximize particular objectives of the map (e.g., balance) or are 
evolved to maximize the novelty score of the map. In Ropossum (b) the designer may select to 
design elements of Cut the Rope (Chillingo, 2010) game levels; the generation of the remaining 
elements are left to evolutionary algorithms to design. 
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4.4.2 Autonomous 

The role of autonomous generation is arguably the most dominant PCG role in 
games. The earlier parts of this chapter are already dedicated to extensive discus- 
sions and studies of PCG systems that do not consider the designer in their Creative 
process. As a resuit we will not cover this PCG role in further detail here. What is 
important to be discus sed, however, is the fuzzy borderline between mixed-initiative 
and autonomous PCG systems. It might be helpful, for instance, to consider au¬ 
tonomous PCG as the process by which the role of the designer starts and ends 
with an offline setup of the algorithm. For instance, the designer is only involved 
in the parameterization of the algorithm as in the case of SpeedTree (IDV, 2002). 
One might wish, however, to further push the borderline between autonomous and 
mixed-initiative generation and claim that generation is genuinely autonomous only 
if the Creative process reconsiders and adapts the utility function that drives the con- 
tent generation—thereby becoming Creative in its own right. A static utility function 
that drives the generation is often referred to as mere generation within the compu- 
tational creativity field 038111 . 

While the line between autonomy and collaboration with designers is stili an 
open research question, for the purposes of this book, we can safely claim that the 
PCG process is autonomous when the initiative of the designer is limited to algo- 
rithmic parameterizations before the generation starts. 


4.4.3 Experience-Driven 


As games offer one of the most representative examples of rich and diverse content 
creation applications and are elicitors of unique user experiences experience-driven 
PCG (EDPCG) 0783117841 views game content as the building block of games and 
the generated games as the potentiators of player experience. Based on the above, 
EDPCG is defined as a generic and effective approach for the optimization of user 
(player) experience via the adaptation of the experienced content. According to the 
experience-driven role of PCG in games player experience is the collection of af¬ 
fective patterns elicited, cognitive processes emerged and behavioral traits observed 
during gameplay jTsn . 

By coupling player experience with procedural content generation, the experience- 
driven perspective offers a new, player-centered role to PCG. Since games are com- 
posed by game content that, when played by a particular player, elicits experience 
patterns, one needs to assess the quality of the content generated (linked to the ex¬ 
perience of the player), search through the available content, and generate content 


that optimizes the experience for the player (see Eig. 4.14i. In particular, the key 
components of EDPCG are: 

• Player experience modeling: player experience is modeled as a function of 
game content and player. 
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Fig. 4.14 The four key components of the experience-driven PCG framework. 


• Content quality; the quality of the generated content is assessed and linked to 
the modeled experience. 

• Content representation; content is represented accordingly to maximize search 
efficacy and robustness. 

• Content generator; the generator searches through the generative space for con¬ 
tent that optimizes the experience for the player according to the acquired model. 

Each component of EDPCG has its own dedicated literature and the extensive 
review of each is covered in other parts of the book. In particular, player experi¬ 
ence modeling is covered in Chapter whereas the remaining three components 
of the framework have already been covered in this chapter. A detailed survey and 
discussion about EDPCG is available in II783I . 


4.4.3.1 Experience-Driven PCG in Practice 

Left 4 Dead (Valve, 2008) is an example of the use of experience-driven PCG in a 
commercial game where an algorithm is used to adjust the pacing of the game on 
the fly based on the player’s emotional intensity. In this case, adaptive PCG is used 
to adjust the difficulty of the game in order to keep the player engaged ||60l. Adap¬ 
tive content generation can also be used with another motive such as the generation 
of more content of the kind the player seems to like. This approach was followed, 
for instance, in the Galactic Arms Race II250I game where the weapons presented 
to the player are evolved based on her previous weapon use and preferences. In an¬ 
other EDPCG study, El-Nasr et al. implemented a direct fitness function—derived 
from visual attention theory—for the procedural generation of lighting II188111851 . 
The procedural Zelda game engine ||257| . a game engine designed to emulate the 
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(a) Human 


(b) World-Champion AI 


Fig. 4.15 Example levels generated for two different Mario players. The generated levels maxi- 
mize the modeledykn value for each player. The level on top is generated for one of the experiment 
subjects that participated in (SD while the level below is generated for the world Champion agent 
of the Mario AI competition in 2009. 


popular The Legend of Zelda (Nintendo, 1986-2017) action-RPG game series, is 
built mainly to support experience-driven PCG research. Another example is the 
Work by Pedersen et al. mn, who modified an open-source clone of the classic 
platform game Super Mario Bros (Nintendo, 1985) to allow for personalized level 
generation. The realization of EDPCG in this example is illustrated in Fig. 4.16| 
The first step was to represent the levels in a format that would yield an easily 
searchable space. A level was represented as a short parameter vector describing 
the number, size and placement of gaps which the player can fall through, and the 
presence or absence of a switching mechanic. The next step was to create a model 
of player experience based on the level played and the player’s playing style. Data 
was collected from hundreds of players, who played pairs of levels with different 
parameters and were asked to rank which of two levels best induced each of the 
following User States: fun, challenge, frustration, predictability, anxiety, boredom. 
While playing, the game also recorded a number of metrics of the players’ playing 
styles, such as the frequency of jumping, running and shooting. This data was then 
used to train neural networks to predict the examined user States using evolutionary 
preference learning. Finally, these player experience models were utilized to opti- 
mize game levels for particular players 06171 . Two examples of such levels can be 
seen in Fig. 4.15 It is worth noting—as discussed in Chapter[5]— that one may wish 
to further improve the models of experience of Mario players by including Infor¬ 
mation about the player beyond gameplay such as her head pose 06101 or her 
facial expressions 


4.4.4 Experience-Agnostic 

With experience-agnostic PCG we refer to any PCG approach that does not con- 
sider the role of the player in the generation of content. But where should we set 
the boundary of involvement? When do we consider a player as part of the gener¬ 
ation process and when don’t we? While the borderline between experience-driven 
and experience-agnostic is not trivial to draw we define any PCG approach whose 
content quality function does not include a player (experience) model or it does not 
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Fig. 4.16 The EDPCG framework in detail. The gradient grayscale-colored boxes represent a con¬ 
tinuum of possibilities between the two ends of the box while white boxes represent discrete, 
exclusive options within the box. The blue arrows illustrate the EDPCG approach followed for 
the Super Mario Bros (Nintendo, 1985) example study I521II6171 : Content quality is assessed via 
a direct, data-driven evaluation function which is based on a combination of a gameplay-based 
(model-free) and a subjective (pairwise preference) player experience modeling approach; content 
is represented indirectly and exhaustive search is applied to generate better content. 


interact with the player in any way during generation as experience-agnostic. As 
with the role of autonomous PCG, this chapter has already gone through several 
examples of content generation that do not involve a player or a player experience 
model. To avoid being repetitive we will refer the reader to the PCG studies covered 
already that are outside the dehnition of experience-driven PCG. 
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4.5 What Could Be Generated? 

In this section we briefly outline the possible content types that a PCG algorithm 
can generate in a game. Generally speaking Liapis et al. 0811 identified six Creative 
domains (or else facets) within games that we will follow for our discussion in 
this section. These include level architecture (design), audio, visuals, rules (game 
design), narrative, and gameplay. In this chapter we will cover the hrst hve facets and 
we purposely exclude the gameplay facet. Creative gameplay is directly associated 
with play and as such is covered in the previous chapter. We conclude this section 
with a discussion on complete game generation. 


4.5.1 Levels and Maps 

The generation of levels is by far the most popular use of PCG in games. Levels 
can be viewed as necessary content since every game has some form of spatial 
representation or Virtual world within which the player can perform a set of actions. 
The properties of the game level, in conjunction with the game rules, frame the 
ways a player can interact with the world and determine how the player can progress 
from one point in the game to another. The game’s level design contributes to the 
challenges a player faces during the game. While games would often have a hxed set 
of mechanics throughout, the way a level is designed can influence the gameplay and 
the degree of game challenge. For that reason, a number of researchers have argued 
that levels coupled with game rules dehne the absolutely necessary building blocks 
of any game; in that regard the remaining facets covered below are optional 03711 . 
The variations of possible level designs are endless: a level representation can vary 
from simple two-dimensional illustrations of platforms and coins—as in the Super 
Mario Bros (Nintendo, 1985) series—to the constrained 2D space of Candy Crush 
Saga (King, 2012), to the three-dimensional and large urban spaces of Assassin’s 
Creed (Ubisoft, 2007) and Call ofDiity (Inhnity Ward, 2003), to the 2D elaborated 
structures of Angry Birds (Rovio, 2009), to the voxel-based open gameworld of 
Minecraft (Mojang 2011). 

Due to their several similarities we can view the procedural generation of game 
levels from the lens of procedural architecture. Similarly to architecture, level design 
needs to consider both the aesthetic properties and the functional requirements of 
whatever is designed within the game world. Depending on the game genre, func¬ 
tional requirements may vary from a reachable end-goal for platform games, to a 
challenging gameplay in driving games such as Forza Motorsport (Turn 10 Stu- 
dios 2005), to waves of gameplay intensity as in Pac-Man (Namco, 1980), Left 4 
Dead (Valve, 2008), Resident Evii 4 (Capcom, 2005) and several other games. A 
procedural level generator also needs to consider the aesthetics of the content as 
the level’s aesthetic appearance may have a signihcant impact not only on the vi- 
sual stimuli it offers to the player but also on navigation. For example, a sequence 
of identical rooms can easily make the player disoriented—as was intended in the 
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Fig. 4.17 The procedurally generated levels of Diablo (Blizzard Entertainment, 1996): one of the 
most characteristic examples of level generation in commercial games. Diablo is a relatively recent 
descendant of Rogue (Toy and Wichmann. 1980) (i.e., rogue-like) role-playing game characterized 
by dungeon-based procedurally generated game levels. Image obtained froin Wikipedia (fair use). 


dream sequences of Max Payne (Remedy, 2001)—while dark areas can add to the 
challenge of the game due to low visibility or augment the player’s arousal as in 
the case of Amnesia: The Dark Descent (Frictional Games, 2010), Nevennind (Fly- 
ing Mollusk, 2015) and Sonanda 0941 . When the level generator considers larger, 
open levels or gameworlds then it draws inspiration from urban and city planning 
M, with edges to constrain player freedom—landmarks to orient the player and 
motivate exploration USD —as in the Grand TheftAuto (Rockstar Games, 1997) se¬ 
ries and districts and areas to break the world’s monotony—as in World ofWarcraft 
(Blizzard Entertainment, 2004) which uses highly contrasting colors, vegetation and 
architectural styles to split the world into districts that are suitable for characters of 
different level ranges. 

As already seen broadly in this chapter the generation of levels in a procedural 
manner is clearly the most popular and possibly the oldest form of PCG in the game 
industry. We already mentioned the early commercial use of PCG for automatic level 
design in games such as Rogue (Toy and Wichman, 1980) and the Rogue-inspired 
Diablo (see Fig. |4.17| l series (Blizzard Entertainment, 1996), and the more recent 
World generation examples of Civilization IV (Firaxis, 2005) and Minecraft (Mo- 
jang, 2011). The level generation algorithms used in commercial games are usually 
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Fig. 4.18 The caricaturized and highly-emotive visuals of the game Limbo (Playdead, 2010). Im- 
age obtained from Wikipedia (fair use). 


constructive, in particular, in games where players can interact with and change the 
game world via play. Players of Spelunky (Yu and Hull, 2009), for instance, are al- 
lowed to modify a level which is not playable (i.e., the exit cannot be reached) by 
blowing up the blocking tiles with a bomb provided by the game level. 

The academic interest in procedural level generation is only recent 0720117831 
161611 but it has produced a substantial volume of studies already. Most of the aca¬ 
demic studies described in this chapter are focused on levels. The interested reader 
may refer to those for examples of level generation across various methodologies, 
PCG roles and game genres. 


4.5.2 Visuals 


Games are, by definition, visual media unless the game is designed explicitly to 
not have visuals—e.g., the Real Sound: Kaze no Regret (Sega, 1997) adventure au¬ 
dio game. The visual Information displayed on the screen conveys messages to the 
player which are dependent on the graphical style, color palette and visual texture. 
Visuals in games can vary from simple abstract and pixelized representations as the 
8-bit art of early arcade games, to caricaturized visuals as in Limbo (Playdead, 2010) 
(see Fig. |4.18|l, to photorealistic graphics as in the FIFA series (EA Sports, 1993) 
Il299l . 


Within the game industry PCG has been used broadly for the generation of any 
of the above visual types. Arguably, the complexity of the visual generation task 
increases the more the resolution and the photorealism of the desired output in- 
creases. There are examples of popular generati ve tools such as SpeedTree (IDV, 
2002) for vegetation and FaceGen (Singular Inversions, 2001) for faces, however, 
that can successfully output photorealistic 3D visuals. Within academia notable ex¬ 
amples of visual generation are the games Petalz 0565115661 (flower generation), 
Galactic Arms Race GSol (weapon generation; see also Fig.[4.3|l and AudioInSpace 
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ETSI (weapon generation). In ali three games visuals are represented by neural 
networks that are evolved via Interactive evolution. Within the domain of weapon 
paiticle generation another notable example is the generation of surprising yet bal- 
anced weapons for the game Unreal Tournament III (Midway Games, 2007) using 
constrained surprise search M240II ; an algorithm that maximizes the surprise score 
of a weapon but at the same time imposes designer constraints to it so that it is bal- 
anced. Other researchers have been inspired by theories about “universal” properties 
of beauty ifTSl to generate visuals of high appeal and appreciation 13771 . The PCG 
algorithm in that study generates spaceships based on their size, simplicity, balance 
and symmetry, and adapts its visual output to the taste of the visuaFs designer via 
interactive evolution. The PCG-assisted design process referred to as iterative re- 
finement 03801 is another way of gradually increasing the resolution of the visuals a 
designer creates by being engaged in an iterative and Creative dialog with the visuals 
generator. Beyond the generation of in-game entities a visuals PCG algorithm may 
focus on general properties of the visual output such as pixel shaders 12821 . lighting 
0188111851 . brightness and saturation, which can all influence the overall appearance 
of any game scene. 


4.5.3 Audio 

Even though audio can be seen as optional content it can affect the player directly 
and its impact on player experience is apparent in most games 0219ll22l1ll29l . Au¬ 
dio in games has reached a great level of maturity as demonstrated by two BAFTA 
Game Awards and an MTV video music award for best video game soundtrack 
HMD- The audio types one meets in games may vary from fully orchestrated sound¬ 
track (background) music, as in Skyrim (Bethesda, 2011), to sound effects, as the dy- 
ing or pellet-eating sounds of Pac-Man (Namco, 1980), to the voice-acted sounds of 
Fallout 3 (Bethesda, 2008). Most notably within indie game development, Proteus 
(Key and Kanaga, 2013) features a mapping between spatial positioning, visuals 
and player interaction, which collectively affect the sounds that are played. Profes- 
sional tools such as the sound middleware of UDK (Epie Games, 2004) and the 
popular sfxr and bfxr sound generation tools provide procedural sound components 
to audio designers, demonstrating a commercial interest in and need of procedurally 
generated audio. 

At a hrst glance, the generation of game audio, music and sounds might not seem 
to be particularly different from any other type of audio generation outside games. 
Games are interactive, however, and that particular feature makes the generation of 
audio a rather challenging task. When it comes to the dehnition of procedural audio 
in games, a progressive stance has been that its procedurality is caused by the very 
interaction with the game. (Eor instance, game actions that cause sound effects of 
music can be considered as procedural audio creation 0220111281 .1 Ideally, the game 
audio must be able to adapt to the current game state and the player behavior. As a 
resuit adaptive audio is a grand challenge for composers since the combinations of 
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ali possible game and player States could be largely unknown. Another difficulty for 
the autonomous generation of adaptive music, in paiticular, is that it requires real- 
time composition and production; both of which need to be embedded in a game 
engine. Aside from a few efforts in that directior^ the current music generation 
models are not particularly designed to perform well in games. In the last decade, 
however, academic work on procedural game music and sound has seen substantial 
advancements in venues such as the Procedural Content Generation and the Musical 
Metacreation workshop series. 

Generally speaking, sound can be either diegetic or non-diegetic. A sound is 
diegetic if its source is within the game’s world. The source of the diegetic sound 
can be visible on the screen (on-screen) or can be implied to be present at the current 
game state (off-screen). Diegetic sounds include characters, voices, sounds made by 
items on-screen or interactions of the player, or even music represented as coming 
from instruments available in the game. A non-diegetic sound, on the other hand, is 
any sound whose source is outside the game’s world. As such non-diegetic sounds 
cannot be visible on-screen or even implied to be off-screen. Examples include com- 
mentaries of a narrator, sound effects which are not linked to game actions, and 
background music. 

A PCG algorithm can generate both diegetic and non-diegetic sounds including 
music, sound effects and commentaries. Examples of non-diegetic sound generation 
in games include the Sonanda horror sound generator that tailors the tension of the 
game to the desires of the designer based on the properties of the game level M394I . 
The mapping between tension and sounds in Sonanda has been derived through 
crowdsourcing 03961 . Similarly to Sonanda —and towards exploring the Creative 
space between audio and level design— Audioverdrive generates levels from audio 
and audio from levels 02730 . Notably within diegetic audio examples, Scirea et al. 
||606l explores the relationship between procedurally generated music and narrative. 
Studies have also considered the presence of game characters on-display for the 
composition of game soundtracks ll7^ or the combination of short musical phrases 
that are driven by in-game events and, in tum, create responsive background audio 
for strategy games |280| . 

Einally, it is worth mentioning that there are games featuring PCG that use music 
as the input for the generation of other Creative domains rather than music per se. 
Eor example, games such as Audio Surf (Eitterer, 2008) and Vib Ribbon (Sony En- 
tertainment, 2000) do not procedurally generate music but they instead use music 
to drive the generation of levels. AudioInSpace II275I is another example of a side- 
scrolling space shooter game that does not generate music but uses the background 
music as the basis for weapon particle generation via artihcial evolution. 


^ For instance, see the upcoming melodrive app at: http://melodrive.com. 
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4.5.4 Narrative 

Many successful games are relying heavily on their narratives; the ciear distinctiori 
however, between such narratives and traditional stories is the interactivity element 
that is offered by games. Now, whether games can teli stories 0131 or games are 
instead a form of narrative IT] is stili an open research question within game stud- 
ies, and beyond the scope of this book. The study of computational (or procedural) 
narrative focuses on the representational and generational aspects of stories as those 
can be told via a game. Stories can play an essential part in creating the aesthetics 
of a game which, in turn, can impact affective and cognitive aspects of the playing 
experience mni. 

By breaking the game narrative into subareas of game content we can find core 
game content elements such as the game’s plotline II562112291 . but also the ways a 
story is represented in the game environment 07301 l83l . The coupling of a game’s 
representation and the story of the game is of vital importance for player experi¬ 
ence. Stories and plots are taking place in an environment and are usually told via 
a Virtual camera lens. The behavior of the Virtual camera—viewed as a parame- 
terized element of computational narrative—can drastically influence the player’s 
experience. That can be achieved via an affect-based cinematographic representa¬ 
tion of multiple cameras as those used in Heavy Rain (Quantic Dream, 2010) or 
through an affect-based automatic camera controller as that used in the Maze-Ball 
game GHq). Choosing the best camera angle to highlight an aspect of a story can be 
seen as a multi-level optimization problem, and approached with combinations of 
optimization algorithms ll85l . Games such as World ofWarcraft (Blizzard Entertain- 
ment, 2004) use cut scenes to raise the story’s climax and lead the player to partic- 
ular player experience States. The creati on or semi-automatic generation of stories 
and narratives belongs to the area of interactive storytelling, which can be viewed 
as a form of story-based PCG. The story can adjust according to the actions of the 
player targeting personalized story generation (e.g., see 1568111061 among others). 
Ultimately, game worlds and plot point story representations can be co-generated as 
demonstrated in a few recent studies (e.g., see 12481 '). 

Computational narrative methods for generating or adapting stories of exposi- 
tions are typically build on planning algorithms, and planning is therefore essential 
for narrative 17921 . The space of stories can be represented in various ways, and the 
representations in turn make use of dissimilar search/planning algorithms, includ- 
ing traditional optimization and reinforcement leaming approaches II483II117111061 . 
Cavazza et al. CMI, for instance, introduced an interactive storytelling system built 
with the Unreal game engine that uses Hierarchical Task NetWork planning to sup- 
port story generation and anytime user intervention. Young et al. 17921 introduced an 
architecture called Mimesis, primarily designed to generate intelligent, plan-based 
character and system behavior at runtime with direct uses in narrative generation. 
Finally the IDtension engine II682I dynamically generates story paths based on the 
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player’s choices; the engine was featured in Nothing for D/nnerJ^a 3D Interactive 
story aiming to help teenagers living challenging daily life situations at home. 

Similarly to dominant approaches of narrative and text generation, interactive 
storytelling in games relies heavily on stored knowledge about the (game) world. 
Games that rely on narratives—such as Heavy Rain (Quantic Dream, 2010)—may 
include thousands of lines of dialog which are manually authored by several writers. 
To enable interactive storytelling the game should be able to select responses (or 
paths in the narrative) based on what the player will do or say, as in Fagade em 
(see Fig. |4.19| l. To alleviate, in part, the burden of manually representing world 
knowledge, data-driven approaches can be used. For instance, one may crowdsource 
actions and utterance data from thousand of players that interact with Virtual agents 
of a game and then train Virtual agents to respond in similar ways using n-grams 
II508I . Or instead, one might design a system in which designers collaborate with 
a computer by taking tums on adding sentences in a story; the computer is able 
to provide meaningful sentences by matching the current story with similar stories 
available on the cloud 16731 . Altematively, a designer could use the news of the day 
from sites, blogs or Wikipedia and generate games that teli the news implicitly via 
play (1371. 

Research on interactive narrative and story-based PCG benefits from and in- 
fluences the use of believable agents that interact with the player and are inter- 
woven in the story plot. The narrative can yield more (or less) believability to 
agents and thus the relationship between agents and the story they teli is impor¬ 
tant 0801114011153II [1061 . In that sense, the computational narrative of a game may 
define the arena for believable agent design. Research on story-based PCG has also 
influenced the design and development of games. Starting from popular indepen- 
dent attempts like Fagade em (see Fig. |4.19| l, Prom Week 114481 and Nothing for 
Dinner to the commercial success of The Elder Scrotis V: Skyrim (Bethesda Soft- 
works, 2011), Fleavy Rain (Quantic Dream, 2010) and Mass Effect (Bioware, 2007) 
narrative has traditionally been amongst the key factors of player experience and 
immersion; particularly in narrative-heavy games as the ones aforementioned. 

Examples of sophisticated computational narrative techniques Crossing over 
from academia to commercial-standard products include the storytelling system 
Versu II1971 which was used to produce the game Blood & Laureis (Emily Short, 
2014). Eor the interested reader the interactive fiction databas^cQnX.2ivas, a detailed 
list of games built on the principies of interactive narratives and fiction, and the stoy- 
gen.or^repository, by Chris Martens and Rogelio E. Cardona-Rivera, maintains 
existing openly-available computational story generation systems. Einally note that 
the various ways AI can be used to play text-based adventure games and interactive 
fiction are covered in Chapterj^ 


'* http://nothingfordinner.org 
^ http://ifdb.tads.org/ 

^ http://storygen.org/ 
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Fig.4.19 A screenshot from the Fagade (SD game featuring its main characters: Grace and Trip. 
The seminal design and AI technologies of Fagade popularized the vision of Interactive narrative 
and story-based PCG within games. 


4.5.5 Rules and Mechanics 

The game rules frame the playing experience by providing the conditions of play— 
for instance, winning and losing conditions—and the actions available to the player 
(game mechanics). Rules constitute necessary content as they are in a sense the core 
of any game, and a game’s rules pervade it. 

For most games, the design of their ruleset largely defines them and contributes 
to their success. It is common that the rule set follows some Standard design pattems 
within its genre. For example, the genre of platform games is partly defined by run- 
ning and jumping mechanics, whereas these are rare in puzzle games. Evidently, the 
genre constrains the possibility (design) space of the game rules. While this prac- 
tice has been beneficial—as rule sets are built on earlier successful paradigms—it 
can also be detrimental to the creativity of the designer. It is often the case that the 
players themselves can create new successful game variants (or even sub-genres) by 
merely altering some rules of an existing game. A popular example is the modifi- 
cation of Warcraft III (Blizzard, 2002) which allowed the player to control a single 
“hero” unit and, in turn, gave rise to a new, popular game genre named Multiplayer 
Online Battle Arenas (MOBA). 
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Most existing approaches to mle generation take a search-based approach, and 
are thus dependent on some way of evaluating a set of rules 071II13551 . However, 
accurately estimating the quality of a set of game rules is very hard. Game rules 
differ from most other types of game content in that they are almost impossible to 
evaluate in isolation from the rest of the game. While levels, characters, textures, and 
many other types of content can to some extent be evaluated outside of the game, 
looking at a set of rules generally gives very little Information of how they play. For 
a human, the only way to truly experience the rules of a game is to play the game. 
For a computer, this would translate to simulating gameplay in some way in order 
to evaluate the rules. (In this sense, rules can be said to be more similar to program 
code than they are to e.g., pictures or music.) 

So how can simulated playthroughs be used to judge the quality of the rulesets? 
Several ideas about how to judge a game depending on how agents play it have been 
introduced. The first is balance; for symmetric two-player games in particular, bal- 
ance between the winning chances of the two players is generally positive Il274l . 
Another idea is outcome uncertainty, meaning that any particular game should be 
“decided” as late as possible ca. Yet another idea is leamability: a good game, 
including its ruleset, is easy to learn and hard to master. In other words, it should 
have a long, smooth learning curve for the player, as learning to play the game is 
part of what makes it fun. This idea can be found expressed most clearly in Koster’s 
“Theory of Fun” II351L but can also be said to be implicit in Schmidhuber’s the- 
ory of artihcial curiosity 060211 and in theories in developmental psychology 02041 . 
Within Work in game rule generation, attempts have been made to capture this idea 
in different ways. One way is to use a reinforcement learning agent to try to learn to 
play the game; games where the agent improves the most over a long time score the 
best mu. Another way of capturing this idea is to use several agents of different 
skill levels to try to play the game. Games score better when they maximize the 
performance difference between these agents 04911 . This idea is also related to the 
idea of depth in games, which can be seen as the length of the chain of heuristics 
that can be acquired in a game 03620 . 

Perhaps the most successful example of game rule generation within academic 
research is Ludi ea . Ludi follows a search-based PCG approach and evolves gram- 
mars that represent the rules of board games (see Fig. 4.20| l. The fitness function 
that drives the rule generation is composed by several metries that estimate good 
design patterns in board games, such as game depth and rule simplicity. A success- 
fully designed game that came out of the Ludi generator is named Yavalath OTSII (see 
Fig. 4.20| i. The game is played on a 5-by-5 hexagonal board by two or three players. 
Yavalath’s winning and losing conditions are very simple: you win if you make a 
line of four tokens, whereas you lose if you make a line of three tokens; the game is 
a draw if the board hlls up. 

One of the earliest examples of video game rule generation is Togelius and 
Schmidhuber’s experiment with generating simple Pac-Man-like games 07161 . In 
that study rules are evolved to maximize the leamability of player agents as mea- 
sured via simulated playthroughs. Another example is the work by Nielsen et al. in 
which game rules are represented using the video game descripti on language 04921 . 
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Using Answer Set Programming (a solver-based method) ll69l rather than search- 
based methods, rules have been generated for simple 2D games that, however, re- 
spect constraints such as playability (i.e., the victory condition is attainable) 06371 . 
It is fair to say that none of these attempts to generate video games have been able 
to produce good games, i.e., games that anyone (except the creator of the system 
that creates the games) would want to play. This points to the immense challenge in 
accurately estimating the quality of a video game rule set. One of the reasons this 
seems to be a more challenging problem than estimating the quality of a board game 
rule set is the time dimension, as human reaction time and ability to estimate and 
predict unfolding processes play an important role in the challenge of many video 
games. 

For more examples and in-depth analysis on rule and mechanics generation the 
interested reader is refetTed to the “rules and mechanics” chapter of the PCG book 

M. 


4.5.6 Games 

Game generation refers to the use of PCG algorithms for computationally design- 
ing new complete games. The vast majority of PCG studies so far, however, have 
been very specific to a particular game facet or domain. It is, for instance, either 
a level that is generated or the audio for some level but rarely both. Meanwhile it 
is surprising to think that the relationship between the different facets is naturally 
interwoven. A characteristic example of the interwoven nature among game facets 
is given in HMD: player actions—viewed as a manifestation of game rules—are 
usually accompanied by corresponding sound effects such as the sound of Mario 
jumping in Super Mario Bros (Nintendo, 1985). Now let us think of a PCG al- 
gorithm that introduces a new rule to the game—^hence a new player action. The 
algorithm automatically constrains the sound effects that can be associated to this 
new action based on a number of factors such as the action’s duration, purpose and 
overall contribution to the game plot. Actions and sounds appear to have a cause and 
effect (or hierarchical) relationship and a PCG algorithm would naturally prioritize 
the creation of the action before it generates its sound. Most relationships between 
facets, however, are not strictly hierarchical or unidirectional. For example, a game 
level can be successful because of a memorable landmark as much as the gameplay 
it affords HMD- Similarly, the narrative of a game relies on a multitude of factors 
including the camera placement as well as the visuals and the sounds. 

The game generation topic has attracted a growing interest in recent years even 
though the relationship between the different games facets is not considered largely. 
Most game generation projects focus on a single facet of a game and do not inves¬ 
tigate the interaction between different facets. The rule generator for Pac-Man-like 
games illbl, for instance, evolves different rules for different colored agents but it 
does not evolve the agents’ color to indicate different playing strategies. Similarly 
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Populatiou 




(b) A deluxe version of the Yavalath game 


Fig. 4.20 The Ludi game rule generator (a) and its game Yavalath (b). Image (a) is adapted from 
l720l . Image (b) is published with permission by Cameron Browne and Nestor Romeral Andres. 
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to the ghosts of Pac-Man (Namco, 1981) we could imagine that red represents an 
aggressive behavior whereas orange represents a passive behavior. 

Among the few notable attempts of game generation, Game-O-Matic 07231 is a 
game generation system that creates games representing ideas. More specifically, 
Game-O-Matic includes an authoring tool in which a user can enter entities and 
their interactions via concept maps. The entities and interactions are translated, re- 
spectively, into game visuals and game mechanics; the mechanics, however, do not 
take into account the visuals or semantics of the game objects they are applied on. 
One of the first preliminary discussions on how multi-facet integration might hap- 
pen is offered by Nelson and Mateas 04840 . In their paper they present a system for 
matching sprites (visuals) to very simple WarioWare-style mechanics. Their system 
is somewhat similar to Game-O-Matic, but works a bit differently; instead of the 
designer specifying verbs and nouns she wants a game to be about, she gives the 
system constraints on how verbs and nouns relate in the game (for example, a chas- 
ing game needs a “prey” sprite that is something that can do things like “flee” or “be 
hunted” and so on). The system then uses ConceptNej^ and WordNej^to generate 
games that fit these constraints. 

Arguably one of the most elaborate examples of game generation is ANGELINA 
IIT351 fT37l ■ ANGELIN40 is a game generator that has seen several develop- 
ments over the years and is currently capable of evolving the rules and levels of the 
game, collecting and placing pieces of visuals and music (that are relevant to the 
theme and the emotive mood of the game), giving names for the games it creates 
and even creating simple commentaries that characterize them. ANGELINA is able 
to generate games of different genres—including platformer games (see Eig. 4.21[ ) 
and 3D adventure games—some of which have even participated in game design 
competitions lEl. 

The Systems above make some initiai, yet important, steps towards game gener¬ 
ation and they attempt to interweave the different domains within games in a mean- 
ingful way, mostly in a hierarchical fashion. However, PCG eventually needs to 
rise to the challenge of tackling the compound generation of multiple facets in an 
orchestrated manner 1171II 13711 . An early study on the fusion of more than one 
generative facet (domain) in games is the one performed recently by Karavolos et 
al. 03241 . The study employs machine learning-based PCG to derive the common 
generative space—or the common patterns—of game levels and weapons in first- 
person shooters. The aim of this orchestration process between level design and 
game design is the generation of level-weapon couplings that are balanced. The un- 
known mapping between level representations and weapon parameters is learned by 
a deep convolutional neural network which predicts if a given level with a particular 
set of weapons will be balanced or not. Balance is measured in terms of the win-lose 
ratio obtained by AI bots playing in a deathmatch scenario. Eigure 4.22 illustrates 
the architecture used to fuse the two domains. Eor the interested reader in the or- 


^ http://conceptnet.io/ 

® http.s ://wordnet. princeton.edu/ 

® http://www.gamesbyangelina.org/ 

















196 


Chapter 4. Generating Conterit 



Fig. 4.21 ANGELINA’s Puzzling Present game. The game features an invert gravity mechanic that 
allows a player to overcome the high obstacle on the left and complete the level. Image obtained 
with permission from http://www.gamesbyangelina.org/. 
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Fig. 4.22 A convolutional neural network (CNN) architectare used for fusing levels and weapons 
in first-person shooters. The network is trained to predict whether a combination of a level and a 
weapon would yield a balanced game or not. The CNN can be used to orchestrate the generation 
of a balanced level given a particular weapon and vice versa. Image adapted from (3^ . 


chestration process we further elaborate on the topic in the last chapter of this book; 
some early discussions on this vision can also be found in II371I . 
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4.6 Evaluating Content Generators 

Creating a generator is one thing; evaluating it is another. Regardless of the method 
followed ali generators shall be evaluated on their ability to achieve the desired goals 
of the designer. Arguably, the generation of any content is trivial; the generation of 
valuable content for the task at hand, on the other hand, is a rather challenging 
procedure. (One may claim, however, that the very process of generating valuable 
content is also, by itself, trivial as one can design a generator that retums a randotn 
satnple of hand-crafted masterpieces.) Further, it is more challenging to generate 
content that is not only valuable but is also novel or even inspiring. 


4.6.1 Why Is It Difficult? 

But what makes the evaluation of content so difficult? First, it is the diverse, stochas- 
tic and subjective nature of the users that experience the content. Whether players or 
designers, content users have dissimilar personalities, gameplay aims and behavior, 
emotive responses, intents, styles and goals 12281. When designing a PCG system it 
is critical to remember that we can potentially generate massive amounts of content 
for designers to interact with and players to experience. It is thus of utmost impor- 
tance to be able to evaluate how successful the outcomes of the generator might be 
across dissimilar users: players and designers. While content generation is a cheap 
process relying on algorithms, design and game-play are expensive tasks relying on 
humans who cannot afford the experience of bad content. Second, content quality 
might be affected by algorithms and their underiying stochasticity, for instance, in 
evolutionary search. Content generators often exhibit non-deterministic behavior, 
making it very hard to predict a priori what the outcomes of a particular generative 
System might be. 


4.6.2 Function vs. Aesthetics 

Particular properties of content can be objectively defined and tested whereas other 
properties of it can only be assessed subjectively. It is only natural to expect that 
functional properties of content quality can be objectively defined whereas a large 
part of its aesthetics can only be defined subjectively. For instance, playability of 
a level is a functional characteristic that can be objectively measured—e.g., an AI 
agent manages to complete the level; hence it is playable. Balance and symmetry 
can also be objectively defined to a degree through estimates of deviation from a 
norm—it may be a score (balance) or a distance from a Central choke point in the 
map (symmetry). There are games, however, for which content balance, symmetry 
and other functional properties are not trivially measurable. And of course there are 
several aspects of content such as the comprehensibility of a narrative, the pleas- 
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antnesses of a color scheme, the preference for a room’s architectural style or the 
graphics style, and the experience of sound effects and music that are not objectively 
measured. 

Functional, objectively defined, content properties can be expressed either as 
metrics or as constraints that a generator needs to satisfy. Constraints can be spec- 
ibed by the content designers or imposed by other game content already available. 
For instance, let us assume that a well-designed generated strategy game level needs 
to be both balanced and playable. Playability can form a simple binary constraint: 
the level is playable when an AI agent manages to complete it; it is non-playable oth- 
erwise. Balance can form another constraint by which all items, bases and resources 
are accessible to similar degrees by all players; if equal accessibility is below a 
threshold value then the constraint is not satisfied. Next, let us suppose we wish to 
generate a new puzzle for the map we just generated. Naturally, the puzzle needs 
to be compatible with our level. A PCG algorithm needs to be able to satisfy these 
constraints as part of its quality evaluation. Constrained satisfaction algorithms such 
as the feasible-infeasible two-population genetic algorithm 1379113821 . constrained 
divergent search rewarding content value but also content novelty II382I or surprise 
II240I . and constraint solvers such as answer set programming 163811 are able to 
handle this. The generated results are within constraints, thereby valuable for the 
designer. Value, however, may have varying degrees of success and this is where 
alternative methods or heuristics can help, as we cover in the section below. 


4.6.3 How Can We Evaluate a Generator? 

Generally speaking, a content generator can be evaluated in three ways; directly 
by the designer or indirectly by either human players or AI agents. Designers can 
directly observe properties of the content generator and take decisions based on data 
visualization methods. Human players can play and test the content and/or provide 
feedback about the content via subjective reporting. AI agents can do the same; play 
the content or measure something about the content and report it to us in the form 
of a quality metric, or metrics. Clearly, machines cannot experience the content 
but they can, instead, simulate it and provide us estimates of content experience. 
The overall evaluation process can very well combine and benefit from any of the 
above approaches. In the remainder of this section we cover the approaches of data 
visualization, AI automated playtesting and human playtesting in further detail. 


4.6.3.1 Visualization 

The visualization approach to content quality assessment is associated with a) the 
computation of meaningful metrics that can assign measurable characteristics to 
content and b) ways to visualize these metrics. The task of metric design can be 
viewed as the equivalent of fitness function design. As such, designing good con- 
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Density Linearity 


Fig. 4.23 The expressive range of the Ropossum level generator for the metrics of linearity and 
density. Adapted from l608l . 


tent generation quality metrics in an ad-hoc manner involves a degree of practical 
wisdom. Metrics, for instance, can be based on the expressive range of the gener¬ 
ator under assessment, so-called expressivity metrics II640116081 . The analysis of a 
generator’s expressivity gives us access to the potential overall quality of the gen¬ 
erator across its full range of generative space. The generative space can then be 
visualized as heatmaps or alternative graphical representations such as 2D or 3D 
scatter plots (see Fig. 4.23 for an example). It is one thing if a level generator is 
able to create only a few meaningful or playable levels and another if the genera¬ 
tor is robust and consistent with respect to the playability of its generated levels. 
It is also one thing if our generator is only able to generate levels with very spe- 
cific characteristics within a narrow space of its expressive range and another if our 
level generator is able to express a broad spectrum of level properties, yielding uni- 
formly covered expressive ranges. Such information can be directly visible on the 
illustrated heatmaps or scatter plots. Alternatively, data compression methods can 
be used directly on the generated content and offer us 2D or 3D representations of 
the generative space, thereby, bypassing the limitations of ad-hoc metric design. An 
example of this approach is the use of autoencoders for compressing the images 
produced by the DeLeNoX autonomous content generator 037311 . 
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4.6.3.2 AI 

Using AI to playtest our generator is a safe and relatively cheap way to rapidly 
retrieve quality metrics for it without relying on human playtesting. In the same 
way that a search-based PCG method would use AI to simulate the content before 
generating it, AI agents can test the potential of a generator across a number of 
metrics and return us values about its quality. The metrics can be in the form of 
classes—for instance, test checks performed for completing an area of a level— 
scalar values—e.g., a levefs balance—or even ordinal values—e.g., the rank of the 
level in terms of asymmetry. The relevance of the metrics to the generator’s quality 
is obviously dependent on the designer’s ad-hoc decisions. Once again, designing 
appropriate metrics for our AI agent to compute is comparable to the challenge of 
designing any utility function. An interesting approach to Al-based testing is the 
use of procedural personas 0267112691 . These are data-driven inferred models of 
dissimilar play styles that potentially imitate the different styles of human play. In 
a sense, procedural personas provide a more human-realistic approach to Al-based 
testing of a generator. Finally, by visualizing particular game artifacts or simulating 
them through the use of AI agents we can have access to the Information we might be 
able to extract from a game, we can understand what is possible within our content 
space, we can infer how rules and functions operate in whatever we generate, and 
we can possibly understand how the information we are able to extract relates to 
data we can extract from human playtesting 04810 . 


4.6.3.3 Human Players 

In addition to data visualization and Al-based simulation for the evaluation of a con¬ 
tent generator a designer might wish to use complementary approaches that rely on 
quantitative user studies and playtesting. Playtesting is regarded to be an expensive 
way to test a generator but it can be of immense benefit for content quality assurance, 
in particular for those aspects of content that cannot be measured objectively—e.g., 
aesthetics and playing experience. The most obvious approach to evaluate the con¬ 
tent experienced by players is to explicitly ask them about it. A game user study can 
involve a small number of dedicated players that will play through various amounts 
of content or, alternatively, a crowdsourcing approach can provide sufficient data 
to machine learn content evaluation functions (see II62111370111211 among others). 
Data obtained can be in any format including classes (e.g., a binary answer about the 
quality of a level), scores (e.g., the likeliness of a sound) or ranks (e.g., a preference 
about a particular level). It is important to note that the playtesting of content can 
be complemented by annotations coming from the designers of the game or other 
experts involved in content creation. In other words, our content generator may be 
labeled with both first-person (player) and third-person (designer) annotations. Fur- 
ther guidelines about which questionnaire type to use and advice about the design 
of user study protocols can be found in Chapter]^ 
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4.7 Further Reading 

Extensive versions of most of the material covered in this chapter can be found 
in dedicated chapters of the PCG in Games Textbook 0616L In particular, all the 
methods and types of PCG in games are covered in further detail (see Chapters 1 to 
9 of 16161). Considering the roles of PCG, the mixed-initiative and the experience- 
driven role of PCG are, respectively, detailed in Chapters 11 03741 and 10 06 1 81 
of that textbook 06160 . In addition to Chapter 11 of 06161 the framework named 
mixed-initiative co-creativity 07740 provides a theoretical grounding for the impact 
of mixed-initiative interaction on the creativity of both designers and computational 
processes. Further, the original articles about the experience-driven PCG framework 
can be found in 0783117841 . Finally, the topic of PCG evaluation is covered also in 
the final chapter 06151 of the PCG in Games textbook 06161 . 


4.8 Exercises 

PCG offers endless opportunities for generation and evaluation across the different 
creativity domains in games and across combinations of those. As an initial step 
we would recommend the reader to start experimenting with maze generation and 
platformer level generation (as outlined below). The website of the book contains 
details regarding both frameworks and potential exercises. 


4.8.1 Maze Generation 

Maze generation is a very popular type of level generation and relevant for several 
game genres. In the first exercise we recommend that you develop a maze genera¬ 
tor using both a constructive and a search-based PCG approach and compare their 
performance according to a number of meaningful criteria that you will define. The 
reader may use the Unity 3D open-access maze generation framework which is 
available at: http://catlikecoding.com/unity/tutorials/maze/. Further guidelines and 
exercises for maze generation can be found at the book’s website. 


4.8.2 Platformer Level Generation 

The platformer level generation framework is based on the Infinite Mario Bros (Pers- 
son, 2008) framework which has been used as the main framework of the Mario AI 
(and later Platformer AI) Competition since 2010. The competition featured several 
different tracks including gameplay, learning, Turing test and level generation. For 
the exercises of this chapter the reader is requested to download the level generation 
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framework (https://sites.google.com/site/platformersai/) and apply constructive and 
generate-and-test methods for the generation of platformer levels. The levels need to 
be evaluated using one or more of the methods covered in this book. Further details 
and exercises with the platformer level generation framework can be found at the 
book’s website. 


4.9 Summary 

This chapter viewed AI as a means for generating content in games. We defined 
procedural content generation as the algorithmic process of creating content in and 
for games and we explored the various benefits of this process. We then provided a 
general taxonomy about content and its generation and explored the various ways 
one can generate content including search-based, solver-based, grammar-based, ma- 
chine learning-based, and constructive generation methods. The use of the PCG 
method is naturally dependent on the task at hand and the type of content one wishes 
to generate. It further depends on the potential role the generator might take within 
games. We outlined the four possible roles a generator can take in games which 
are determined by the degree to which they involve the designer (autonomous vs. 
mixed-initiative) and/or the player (experience-agnostic vs. experience-driven) in 
the process. The chapter ends with a discussion on the important and rather un- 
explored topic of evaluation, the challenges it brings, and a number of evaluation 
approaches one might consider. 

We have so far covered the most traditional use of AI in games (Chapter]^ and 
the use of AI for generating parts of (or complete) games (this chapter). The next 
and final chapter of this part is dedicated to the player and the ways we can use AI 
to model aspects of her behavior and her experience. 
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Chapter 5 

Modeling Players 


This chapter is dedicated to players and the use of AI for modeling them. This area 
of research is often called player modeling 078211636ll . We take player modeling to 
mean the detection, prediction and expression of human player characteristics that 
are manifested through cognitive, affective and behavioral pattems while playing 
games. In the context of this book, player modeling studies primarily the use of AI 
methods for the construction of computational models of players. By model we re¬ 
fer to a mathematical representation—it may be a rule set, a vector of parameters, 
or a set of probabilities—that captures the underlying function between the charac¬ 
teristics of the player and her interaction with the game, and the player’s response to 
that interaction. Given that every game features at least one player (with some no- 
table exceptions OSOl ). and that player modeling affects work on game-playing and 
content generation, we consider the modeling of player behavior and experience as 
a very important use of AI in games 0764117851 . 

Psychology has studied human behavior, cognition and emotions for a long 
time. Branches of computer Science and human-computer interaction that attempt 
to model and simulate human behavior, cognition, emotion or the feeling of emo- 
tion (affect) include the fields of affective computing and user modeling. Player 
modeling is related to these fields but focuses on the domain of games. Notably, 
games can yield dynamic and complex emotions in the player, the manifestations 
of which cannot be captured trivially by Standard methods in empirical psychol¬ 
ogy, affective computing or cognitive modeling research. The high potential that 
games have in affecting players is mainly due to their ability to place the player in a 
continuous mode of interaction, which, in tum, elicits complex cognitive, affective 
and behavioral responses. Thus, the study of the player may not only contribute to 
the design of improved forms of human-computer interaction, but also advance our 
knowledge of human experiences. 

As mentioned earlier, every game features at least one user—the player—who 
Controls some aspect of the game environment. The player character could be visible 
in the game as an avatar or a group of entities ll94l . or could be invisible as in many 
puzzle games and casual games. Control may vary from the relatively simple (e.g., 
limited to movement in an orthogonal grid) to the highly complex (e.g., having 
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to decide several times per second between hundreds of different possibilities in a 
highly complex 3D world). Given these intricacies, understanding and modeling the 
interaction between the player and the game can be seen as a holy grail of game 
design and development. Designing the interaction and the emergent experience 
right results in a successful game that manages to elicit unique experiences. 

The interaction between the player(s) and the game is dynamic, real-time and 
in many cases highly complex. The interaction is also rich in the sense that many 
modalities of interaction may be involved and that the information exchange be¬ 
tween the game and the player may both be fast and entail large amounts of data for 
us to process. If the game is well-designed, the interaction is also highly engaging 
for the player. Given the great amount of information that can be extracted through 
this interaction and used for creating models of the player, the game should be able 
to learn much about the person playing it, as a player and perhaps as a human in 
general. In fact, there is no reason why the model should not know more about how 
you play than you do. 

In the remainder of this chapter we hrst attempt to dehne the core ingredients of 
player modeling (Section 5.1 1 and then we discuss reasons why AI should be used 
to model players (Section 5.2 1 . In Section 5.3 we provide a high-level taxonomy of 
player modeling focusing on two core approaches for constructing a player model: 
top-down and bottom-up. We then detail the available types of data for the modeTs 
input (Section 5.4 1 , a classihcation for the modeFs output (Section [5.5[ ) and the 
various AI methods that are appropriate for the player modeling task (Section [5l6| . 
The key components of player modeling as discussed in this chapter (input, output 
and model) are depicted in Fig. 5.1 We conclude, in Section 5.7 with a number of 
concrete examples of AI being used for modeling players. 


5.1 What Player Modeling Is and What It Is Not 

One could arguably detect behavioral, emotional or cognitive aspects of both hu¬ 
man players and non-human players, or non-player characters (notwithstanding the 
actual existence of emotions in the latter). However, in this book we focus on as¬ 
pects that can be detected from, modeled from, and expressed in games with human 
players 07821 . We explicitly exclude the modeling of NPCs from our discussion in 
this chapter, as in our definition, player modeling is modeling of a human player. 
Modeling the experience of an NPC would seem to be a futile exercise, as one can 
hardly say that an NPC possesses actual emotions or cognition. Modeling the be- 
havior of an NPC is also of little interest, at least if one has access to the game’s 
code: a perfect model for the NPC already exists. NPC modeling, however, can be a 
useful testbed for player modeling techniques, for instance, by comparing the model 
derived from human players with the hand-crafted one. More interestingly, it can be 
an integral component of AI that adapts its behavior in response to the dynamics of 
the NPCs—as in 12^ . Nevertheless, while the challenges faced in modeling NPCs 
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(Psychology, Cognitive Science, Game 
Studies,...) 


Supervised Learning 


Regression 


Classification 


Preference Learning 
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(Data Science, Machine Learning) 
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Forced Response 
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Continuous 


Time-Discrete vs. 
Time-Continuous 


Pre vs. During vs. Post 
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Absolute vs. Relative 


Micro-actions vs. 
Macro-actions 


Fig. 5.1 The key components of player modeling as discussed in this chapter. The distinction be- 
tween model-based and model-free approaches is outlined in Section [53] The various options for 
the input of the model are discussed in Section [5^ The taxonomy for the modeTs output is dis¬ 
cussed in Section [53] —each box represents a dedicated subsection. Finally, the various AI methods 
(supervised learning, reinforcement learning and unsupervised learning) used for modeling corre- 
sponding output data types are discussed thoroughly in Section[53| 


are substantial, the issues raised from the modeling of human players define a far 
more complex and important problem for the understanding of player experience. 

Sometimes the terms player modeling and opponent modeling 0214ll592ll48ll are 
used interchangeably when a human player is modeled. However, opponent mod¬ 
eling is a more narrow concept referring to predicting behavior of an adversarial 
player when playing to win in an imperfect information game like Poker HSl or 
StarCraft (Blizzard Entertainment, 1988) II504I . Some aspects of modeling NPCs 
or simulated playthroughs for winning in a game are discussed in Chapter]^ 

We also make a distinction between player modeling 0116112811 and player pro- 
filing 07821 . The former refers to modeling complex dynamic phenomena during 
gameplay interaction, whereas the latter refers to the categorization of players based 
on static Information that does not alter during gameplay. Information of static na¬ 
ture includes personality, cultural background, gender and age. We put an emphasis 
on the former, but will not ignore the latter, as the availability of a good player 
profile may contribute to the construction of reliable player models. 
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In summary, player modeling—as we define it in this book—is the study of com- 
putational means for the modeling of a player’s experience or behavior which is 
based on theoretical frameworks about player experience and/or data derived from 
the interaction of the player with a game 0782117641 . Player models are built on dy- 
namic Information obtained during game-player interaction, but they could also rely 
on static player profiling Information. Unlike studies focusing on taxonomies of be- 
havioral player modeling—e.g., via a number of dimensions 06361 or direct/indirect 
measurements 06230 —we view player modeling in a holistic manner including cog¬ 
nitive, affective, personality and demographic aspects of the player. Moreover, we 
exclude approaches that are not directly based on human-generated data or not based 
on empirically-evaluated theories of player experience, human cognition, affect or 
behavior. The chapter does not intend to provide an exhaustive review of player 
modeling studies under the above definition, but rather an introduction and a high- 
level taxonomy that explores the possibilities with respect to the modeling approach, 
the modeFs input and the modePs output. 


5.2 Why Model Players? 

The primary goal of player modeling is to understand how the interaction with a 
game is experienced by individual players. Thus, while games can be utilized as 
an arena for eliciting, evaluating, expressing and even synthesizing experience, we 
argue that the main aim of the study of players in games is the understanding of 
players’ cognitive, affective and behavioral patterns. Indeed, by the very nature of 
games, one cannot dissociate games from player experience. 

There are two core reasons that drive the use of AI for modeling game play¬ 
ers and their play, thereby serving the primary goal of player modeling as stated 
above. The^rsf is for understanding something about their players’ experience dur¬ 
ing play. Models of player experience are often built using machine learning meth- 
ods, typically supervised learning methods like support vector machines or neural 
networks. The training data here consists of some aspect of the game or player-game 
interaction, and the targets are labeis derived from some assessment of player ex¬ 
perience, gathered for example from physiological measurements or questionnaires 
CMl. Once predictors of player experience are derived they can be taken into ac- 
count for designing the in-game experience. That can be achieved by adjusting the 
behavior of non-player characters (see Chapter]^ or by adjusting the game environ- 
ment (see Chapter]^. 

The second reason why one would want to use AI to model players is for under¬ 
standing players’ behavior in the game. This area of player modeling is concemed 
with structuring observed player behavior even when no measures of experience are 
available—for instance, by identifying player types or predicting player behavior via 
game and player analytics 017811186ll . A popular distinction in data derived from 
games 01861 is the one between player metrics and game metrics. The latter is a 
superset of the former as it also includes metrics about the game Software (system 
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metrics) and the game development process as a whole (process metrics). System 
metrics and process metrics are important aspects of modern game development that 
influence decision making with respect to procedures, business models, and market- 
ing. In this book, however, we focus on player metrics. The interested reader may 
refer to 018611 for alternative uses of metrics in games and the application of analytics 
to game development and research—i.e., game analytics. 

Once aspects of player behavior are identified a number of actions can be taken 
to improve the game such as the personalization of content, the adjustment of NPCs 
or, ultimately, the redesign of (parts of) the game. Derived knowledge about the 
in-game behavior of the player can lead to improved game testing and game design 
procedures, and better monetization and marketing strategies 01861 . Within behavior 
modeling we identify four main player modeling subtasks that are particularly rel¬ 
evant for game AI: imitation and prediction —achieved via supervised learning or 
reinforcement learning—and clustering and association mining —achieved via un- 
supervised learning. The two main purposes of player imitation is the development 
of non-player characters with believable, human-like behavioral characteristics, and 
the understanding of human play per se through creating generative models of it. 
The prediction of aspects of player behavior, instead, may provide answers to ques- 
tions such as “when will this player stop playing?” or “how often will that player 
get stuck in that area of the level?” or “which item type will this player pick in the 
next room?”. The aim of clustering is the classification of player behaviors within a 
number of clusters depending of their behavioral attributes. Clustering is important 
for both the personalization of the game and the understanding of playing behavior 
in association with the game design M178II . Finally, association mining is useful in 
instances where frequent patterns or sequences of actions (or in-game events) are 
important for determining how a player behaves in a game. 

While player behavior and player experience are interwoven notions there is a 
subtle difference between them. Player behavior points to what a player does in 
a game whereas player experience refers to how a player feels during play. The 
feeling of one’s gameplay experience is clearly associated with what one does in 
the game; player experience, however, is primarily concerned with affective and 
cognitive aspects of play as opposed to mere reactions of gameplay which refer to 
player behavior. 

Given the above aims, core tasks and sub-tasks of player modeling in the next 
section we discuss the various available options for constructing a player model. 


5.3 A High-Level Taxonomy of Approaches 


Irrespective of the application domain, computational models are characterized by 
three core components: the input the model will consider, the computational model 


per se, and the output of the model (see Fig. 5.1 1 . The model itself is a mapping 


between the input and the output. The mapping is either hand-crafted or derived 
from data, or a mix of the two. In this section we will first go through the most 
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common approaches for constructing a computational model of players, then we 
will go through a taxonomy of possible inputs for a player model (Section |5.4| l 
and finally we will examine aspects of player experience and behavior that a player 
model can represent as its output (Section [53| ). 

A high-level classification of the available approaches for player modeling can 
be made between model-based (or top-down) and model-free (or bottom-up) ap¬ 
proaches 0782117831 . The above definitions are inspired by the analogous classi¬ 
fication in RL by which a world model is available (i.e., model-based) or not (i.e., 
model-free). Given the two ends of this continuum hybrid approaches between them 
can naturally exist. The gradient red color of the player model box in Fig. 5.1 illus- 
trates the continuum between top-down and bottom-up approaches. The remainder 
of this section presents the key elements of and core differences among the various 
approaches for modeling of players. 


5.3.1 Model-Based (Top-Down) Approaches 

In a model-based or top-down II782I approach a player model is built on a theoret- 
ical framework. As such, researchers follow the modus operandi of the humanities 
and social Sciences, which hypothesize models to explain phenomena. Such hy- 
potheses are usually followed by an empirical phase in which it is experimentally 
determined to what extent the hypothesized models fit observations; however, such 
a practice is not the norm within player experience research. While user experience 
has been studied extensively across several disciplines, in this book we identify three 
main disciplines we can borrow theoretical frameworks from and build models of 
player experience: psychology and affective Sciences, neuroscience, and finally, 
game studies and game research. 


5.3.1.1 Psychology and Affective Sciences 


Top-down approaches to player modeling may refer to models derived from popu¬ 
lar theories about emotion 113641 such as the cognitive appraisal theory 11212116011 . 
Further, the player model may rely on well established affect representations such 
as the emotional dimensions of arousal and valence II200I that de fine the circumplex 
model of affect of Russell 15391 (see Fig. 5.2(a) i. Valence refers to how pleasur- 
able (positive) or unpleasurable (negative) the emotion is whereas arousal refers to 
how intense (active) or lethargic (inactive) that emotion is. Following a theoretical 
model, emotional manifestations of players are often mapped directly to specific 
player States. For instance, by viewing player experience as a psychophysiological 
phenomenon II779I a player’s increased heart rate may correspond to high arousal 
and, in turn, to high levels of excitement or frustration. 

Beyond established theories of emotion, model-based approaches can also be in¬ 
spired by a general cognitive-behavioral theoretical framework such as the theory 
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of mind 15401 for modeling aspects of social interactions in games. Popular ex- 
ample frameworks for deriving user models in games include the usability theory 
P489112901 . the belief-desire-intention (BDI) model ll^ I224L the cognitive model 
by Ortony, Clore, and Collins 115121 and Skinner’s behavioristic approach 06331 with 
its links to reward systems in games. Further we can draw inspiration from social 
Sciences and linguistics in order to model lexical aspects of gameplay interaction 
(e.g., chatting). Natural language processing, opinion mining and sentiment analysis 
are normally relying on theoretical models that build on affective and sociological 
aspects of textual communication 0517115141 . 

Of particular impoitance is the concept of flow by Csikszentmihalyi 0151111491 
11500 which has been a popular psychological construet for modeling player experi- 
ence in a top-down fashion. When in a state of flow (or else, state of “happiness”) 
during an activity we tend to concentrate on the present moment, we lose our ability 
of reflective self-consciousness, we feel a sense of personal control over the sit- 
uation or activity, our perception of time is altered, and we experience the activity 
as intrinsically rewarding. Analogously the optimal experience during play has been 
associated with a fine balance between boredom and anxiety, also known as the flow 
channel (see Fig. 5.2(b)| i. Given its direct relevance to player experience, flow has 
been adapted and incorporated for use in game design and for the understanding of 
player experience 067811675114731 . 


5.3.1.2 Neuroscience 

A number of studies have relied on the working hypothesis of an undeiiying map- 
ping between the brain, its neural activity and player experience. However, this re- 
lationship is not well explored and the presumptive mapping is largely unknown. 
For example, interest has been associated with activity in the visual cortex and 
the release of endomorphin whereas the sense of achievement has been linked to 
dopamine levels 1351 . According to 1351 . neuroscientifle evidence suggests that the 
reward systems of games are directly associated with the dopamine-based reward 
structures in the brain and that dopamine is released during gameplay 03461 . Fur¬ 
ther, pleasure has been associated with areas in the brain responsible for decision 
making, thereby revealing the direct links between gameplay experience and deci¬ 
sion making 05751 . Pleasure has also been associated with uncertain outcomes or 
uncertain rewards 06251 as well as with interest and curiosity l43l . which are all key 
elements of successful game design. Stress is also tightly coupled with player expe¬ 
rience given its ciear association with anxiety and fear; stress can be both monitored 
via physiology and regulated via game design. The testosterone levels of players 
have also been measured in association to digital game activities 04441 and flndings 
reveal particular patters of competition in games as testosterone factors. Finally, it 
appears that trust between players in a social gaming setup could be measured indi- 
rectly via oxytocin levels l350l . 

The degree to which current flndings from neuroscience are applicable to player 
experience research is largely unknown since access to neural activity and brain hor- 
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(a) Russeirs two-dimensional circumplex model of affect. The figure contains a small number 
of representative affective States (black circles). 
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(b) An illustration of the ^ow channel. 


Fig. 5.2 Two popular frameworks used for modeling users and their experience in games: (a) the 
arousal-valence circumplex model of affect and (b) the flow channel concept. 
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mone levels remains a rather intrasive process at the time of writing. Manifestations 
of brain activity such as the brain’s electrical waves—measured through electroen- 
cephalography on our scalp—or more indirect manifestations such as stress and 
anxiety—measured through skin conductance—can give us access to approximates 
of brain activity. These approximates can be used for modeling the experience of 
play as discussed later in this chapter. 


5.3.1.3 Game Studies and Game Research 

Theoretical models of user experience in games are often driven by work in game 
studies and game research. Examples of models that have been used extensively in 
the literature include Malone’s core design dimensions that collectively contribute 
to ‘fun’ games 14191 defined as challenge, curiosity and fantasy. In particular, 
challenge refers to the uncertainty of achieving a goal due to e.g., variable difficulty 
level, multiple level goals, hidden Information, and randomness. Curiosity refers to 
the player’s feeling of uncertainty with respect to what will happen next. Finally, 
fantasy is the ability of the game to show (or evoke) situations or contexts that are 
not actually present. These three dimensions have been quantified, operationalized 
and successfully evaluated in prey-predator games 07661 . physical games 0769117751 . 
preschooler games 13200 and racing games 07030 . 

Bartle’s ll33l classification of player types within games as a form of general 
player profiles can be used indirectly for modeling players. Bartle identifies four 
archetypes of players he names killers (i.e., players that focus on winning and are 
engaged by ranks and leaderboards), achievers (i.e., players that focus on achieving 
goals quickly and are engaged by achievements), sociaUzers (i.e., players that focus 
on social aspects of games such as developing a network of friends) and explorers 
(i.e., players who focus on the exploration of the unknown). Various other method- 
ologies have also been followed to derive specific player experience archetypes for 
particular classes of games ll34l 17871 . 

Other popular and interconnected views of player experience from a game design 
perspective include the theory of ‘fun’ by Koster USD, the notion of the ‘magic 
circle’ in games II587I and the four “fun” factor model of Lazzaro 03651 . Indicatively, 
Koster’s theory relates the concept of fun with learning in games; the more you leam 
the more you tend to play a game. According to his theory you stop playing a game 
that is way too easy (no learning of new skills) or way too hard (no learning either). 
Lazzaro’s four fun factors are named hard fun (e.g., playing to win and see how 
good I am at it), easy fun (e.g., playing to explore new worlds and game spaces), 
serious fun (e.g., playing to feel better about myself or get better at something that 
matters to me) and people fun (e.g., playing as an excuse to invite friends over, 
or having fun merely by watching them play). Within game studies, the theoretical 
model of incorporation Il94l is a notable multifaceted approach for capturing player 
immersion. The model is composed of six types of player involvement; affective, 
kinaesthetic, spatial, shared, ludie, and narrative. 
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With a careful analysis of the models proposed and their subcomponents one 
could coherently argue that there is one underlying theoretical model of player ex- 
perience after all. While it is not the intention of this book to thoroughly discuss the 
interconnections between the aforementioned models it is worth pointing out a num- 
ber of indicative examples of our envisaged overarching player experience model. 
An explorer (Bartle), for instance, can be associated with the easy fun factor of Laz- 
zaro and the curiosity dimension of Malone. Further, the achiever archetype (Bartle) 
can be linked to the serious fun factor (Lazzaro). Accordingly, a killer archetype 
(Bartle) maps to the hard fun factor (Lazzaro), the challenge dimension of Malone’s 
model, and a number of flow aspects. Finally, a socializer player profile (Bartle) 
could be associated to people fun (Lazzaro) and, in turn, to the shared involvement 
facet of Calleja 1941 . 

Even though the literature on theoretical models of experience is rather rich, one 
needs to be cautious with the application of such theories to games (and game play¬ 
ers) as the majority of the models have not been derived from or tested on Interactive 
media such as games. Calleja 1941 . for instance, reflects on the inappropriateness of 
the concepts of ‘fun’ and ‘magic circle’ (among others) for games. At this point it 
is worth noting that while ad-hoc designed models can be an extremely powerful 
and expressive they need to be cross-validated empirically to be of practical use for 
computational player modeling; however, such practices are not as common within 
the broader area of game studies and game design. 


5.3.2 Model-Free (Bottom-Up) Approaches 

Model-free approaches refer to the data-driven construction of an unknown map- 
ping (model) between a player input and a player state. Any manifestation of 
player affect or behavioral pattem could define the input of the model (see more 
in Section [5r^ below). A player state, on the other hand, is any representation of the 
player’s experience or current emotional, cognitive, or behavioral state; this is es- 
sentially the output of the computational model (see more in Section [53| . Evidently, 
model-free approaches follow the modus operandi of the exact Sciences, in which 
observations are collected and analyzed to generate models without a strong initial 
assumption on what the model looks like or even what it captures. Player data and 
labeis of player States are collected and used to derive the model. 

Classification, regression and preference learning techniques adopted from ma- 
chine learning—see Chapter|2] —or statistical approaches are commonly used for the 
construction of the mapping between the input and the output. Examples include 
studies in which player actions, goals and intentions are modeled and predicted for 
the purpose of believable gameplay, adaptive gameplay, Interactive storytelling or 
even the improvement of a game’s monetization strategy 1151 Ill800ll414ll6^l592l . 
In contrast to supervised learning, reinforcement learning can be applied when a re- 
ward function, instead, can characterize aspects of playing behavior or experience. 
Unsupervised learning is applicable when target outputs are not available for pre- 
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dictive purposes but, alternatively, data is used for the analysis of playing behavior 
(see Fig. 5.11. 

We meet bottom-up player modeling attempts since the early years of the game 
AI field in hrst-person shooters 0695116961 . racing games 07031 and variants of Pac- 
Man (Namco, 1980) HTTS) . Recently, the availability of large sets of game and player 
data has opened up the horizons of behavioral data mining in games—i.e., game 
data mining 01781 . Studies that attempt to identify different behavioral, playing 
and action patterns within a game are well summarized in 01861 and include 
I1761l687ll6901l750l . among many others. 


5.3.3 Hybrids 

The space between a completely model-based and a completely model-free ap- 
proach can be viewed as a continuum along which any player modeling approach 
might be placed. While a completely model-based approach relies solely on a the- 
oretical framework that maps a player’s responses to game stimuli, a completely 
model-free approach assumes there is an unknown function between modalities of 
User input and player States that a machine learner (or a statistical model) may dis- 
cover, but does not assume anything about the structure of this function. Relative 
to these extremes, the vast majority of studies in player modeling may be viewed 
as hybrids that synergistically combine elements of the two approaches. The con¬ 
tinuum between top-down and bottom-up player modeling approaches is illustrated 
with a gradient color in Fig. o 


5.4 What Is the ModePs Input Like? 

By now we have covered the various approaches available for modeling players 
and we will, in this section, focus on what the input of such a model might be 
like. The modeFs input can be of three main types: (1) anything that a player is 
doing in a game environment gathered from gameplay data—i.e., behavioral data 
of any type such as user interface selections, preferences, or in-game actions; (2) 
objective data collected as responses to game stimuli such as physiology, speech 
and body movements; and (3) the game context which comprises of any player- 
agent interactions but also any type of game content viewed, played, and/or created. 
The three input types are detailed in the remainder of this section. At the end of the 
section we also discuss static prohle Information on the player (such as personality) 
as well as web data beyond games that could feed and enhance the capacity of a 
player model. 
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5.4.1 Gameplay 

Given that games may affect the player’s cognitive processing patterns and cogni¬ 
tive focus we assume that a player’s actions and preferences are linked directly to 
her experience. Consequently, one may infer the player’s current experience by an- 
alyzing patterns of her interaction with the game, and by associating her experience 
with game context variables II132112391 . Any element derived from the direct inter¬ 
action between the player and the game can be classified as gameplay input. These 
interpretable measures of gameplay have also been defined as player metrics II186I . 
Player metrics include detailed attributes of the player’s behavior derived from re¬ 
sponses to game elements such as NPCs, game levels, user menus, or embodied 
conversational agents. Popular examples of data attributes include detailed spatial 
locations of players viewed as heatmaps 11771 . statistics on the use of in-game 
menus, as well as d escri ptive statistics about gameplay, and communication with 
other players. Figurejs^shows examples of heatmaps in MiniDungeon^p\i.zz\t 
game. Both general measures (such as performance and time spent on a task) and 
game-specific measures (such as the weapons selected in a shooter game I250II 1 are 
relevant and appropriate player metrics. 

A major limitation with the gameplay input is that the actual player experience 
is only indirectly observed. For instance, a player who has little interaction with a 
game might be thoughtful and captivated, or just bored and busy doing something 
else. Gameplay metrics can only be used to approach the likelihood of the presence 
of certain player experiences. Such statistics may hold for player populations, but 
may provide little Information for individual players. Therefore, when one attempts 
to use pure player metrics to make estimates of player experiences and make the 
game respond in an appropriate manner to these perceived experiences, it is advis- 
able to keep track of the feedback of the player to the game responses, and adapt 
when the feedback indicates that the player experience was gauged incorrectly. 


5.4.2 Objective 

Computer game players are presented with a wide palette of affective stimuli dur- 
ing game play. Those stimuli vary from simple auditory and visual events (such 
as sound effects and textures) to complex narrative structures, Virtual cinemato- 
graphic views of the game world and emotively expressive game agents. Player 
emotional responses may, in tum, cause changes in the player’s physiology, reflect 
on the player’s facial expression, posture and speech, and alter the player’s attention 
and focus level. Monitoring such bodily alterations may assist in recognizing and 
constructing the player’s model. As such, the objective approach to player model¬ 
ing incorporates access to multiple modalities of player input. 


* http://minidungeons.coiTi/ 
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pletionist: succeeding in killing ali monsters, 
drinking all potions and collecting all treasure. 



(b) In this example the player prioritizes reach- 
ing the exit, avoiding any monsters and only 
collecting potions and treasures that are near 
the path to the exit. 


Fig. 5.3 Two example heatmaps (human playtraces) in the MiniDungeons game. MiniDungeons is 
a simple tum-based rogue-like puzzle game, implemented as a benchmark problem for modeling 
the decision-making styles of human players I2g7l . 


The relationship between psychology and its physiological manifestations has 
been studied extensively l lfT?! l95l 177911558II among many others). What is widely 
evidenced is that the sympathetic and the parasympathetic components of the au- 
tonomic nervous system are involuntary affected by affective stimuli. In general, 
arousal-intense events cause dynamic changes in both nervous systems: an increase 
and a decrease of activity, respectively, in the sympathetic and the parasympathetic 
nervous system. Altematively, activity at the parasympathetic nervous system is 
high during relaxing or resting States. As mentioned above, such nervous system 
activities cause alterations in one’s facial expression, head pose, electrodermal ac¬ 
tivity, heart rate variability, blood pressure, pupil dilation ll^ I624II and so on. 

Recent years have seen a significant volume of studies that explore the interplay 
between physiology and gameplay by investigating the impact of different game- 
play stimuli on dissimilar physiological signals ( B6971l473ll421ll420l 155611721 II 1751 
145III among others). Such signals are usually obtained through electrocardiogra- 
phy (ECG) II780II . photoplethysmography 1780117211 , galvanic skin response (GSR) 
1142II127 ll 12701 \TT2\ . respiration II721I . electroencephalography (EEG) 114931 and 
electromyography (EMG). 

In addition to physiology one may track the player’s bodily expressions (mo- 
tion tracking) at different levels of detail and infer the real-time affective responses 
from the gameplay stimuli. The core assumption of such input modalities is that 
particular bodily expressions are linked to expressed emotions and cognitive pro- 
cesses. Objective input modalities, beyond physiology, that have been explored ex- 
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tensively include facial expressions 1132 II [T9l 12361 l88l 17941 . muscle activation (typ- 
ically face) 01331ll64ll . body movement and posture Il2^l731ll321lll72ll47l . speech 
|l74T][3l9l[308l|306l[30l, text EniinTlElI], haptics |l509l, gestures 1283], brain 
waves 0559l[T3l . and eye movement 11^14691 . 

While objective measurements can be a very informative way of assessing the 
player’s state during the game a major limitation with most of them is that they can 
be invasive, thus affecting the player’s experience with the game. In fact, some types 
of objective measures appear to be implausible within commercial-standard game 
development. PupiUometry and gaze tracking, for instance, are very sensitive to 
distance from screen, and variations in light and screen luminance, which collec- 
tively make them rather impractical for use in a game application. The recent rebirth 
of Virtual reality (VR), however, gives eye gaze sensing technologies entirely new 
opportunities and use within rames 16281 : a notable example of a VR headset that 
features eye-tracking is FOVE|^Other visual cues obtained through a camera (facial 
expressions, body posture and eye movement) require a well-lit environment which 
is often not present in horne settings (e.g., when playing video-games) and they can 
be seen by some players as privacy hazards (as the user is continuously recorded). 
Even though highly unobtrusive, the majority of the vision-based affect-detection 
Systems currently available have additional limitations when asked to operate in 
real-time II794I . We argue that an exception to this rule is body posture, which can 
both be effectively detected nowadays and provide us with meaningful estimates 
of player experience II343I . Aside from the potential they might have, however, the 
appropriateness of camera-based input modalities for games is questionable since 
experienced players tend to stay stili while playing games GD. 

As a response to the limitations of camera-based measurements, speech and text 
(e.g., Chat) offer two highly accessible, real-time efficient and unobtrusive modali¬ 
ties with great potential for gaming applications; however, they are only applicable 
to games where speech (or text) forms a control modality (as e.g., in conversational 
games for children 0320117891 1. collaborative games that naturally rely on speech 
or text for communication across players (e.g., in collaborative first-person shoot- 
ers), or games that rely on natural language processing such as text-based adventure 
games or Interactive fiction (see discussion of Chapter|^. 

Within players’ physiology, existing hardware for EEG, respiration and EMG 
require the placement of body parts such as the head, the chest or parts of the face 
on the sensors, making those physiological signals rather impractical and highly in- 
trusive for most games. On the contrary, recent sensor technology advancements for 
the measurement of electrodermal activity (skin conductivity), photoplethysmogra- 
phy (blood volume pulse), heart rate variability and skin temperature have made 
those physiological signals more attractive for the study of affect in games. Real- 
time recordings of these can nowadays be obtained via comfortable wristbands and 
stored in a personal computer or a mobile device via a wireless connection II779I . 

At the moment of writing there are a few examples of commercial games that 
utilize physiological input from players. One particulaiiy interesting example is 


^ https://www.getfove.com/ 
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Fig. 5.4 A screenshot from Nevermind (Flying Mollusk. 2015). The game supports several off- 
the-shelf sensors that allow the audiovisual content of the game to adapt to the stress levels of the 
player. Image obtained from Erin Reynolds with permission. 


Nevermind (Flying Mollusk, 2015), a biofeedback-enhanced adventure horror game 
that adapts to the player’s stress levels by increasing the level of challenge it pro¬ 
vides; the higher the stress the more the challenge for the player (see Fig. |5.4| l. A 
number of sensors which detect heart activity are available for affective interac- 
tion with Nevermind. The Journey ofWild Divine (Wild Divine, 2001) is another 
biofeedback-based game designed to teach relaxation exercises via the player’s 
blood volume pulse and skin conductance. It is also worth noting that AAA game 
developers such as Valve have experimented with the player’s physiological input 
for the personalization of games such as Left 4 Dead (Valve, 2008) m. 


5.4.3 Game Contexi 

In addition to gameplay and objective input, the game’s context is a necessary input 
for player modeling. Game context refers to the momentanous state of the game 
during play and excludes any gameplay elements; those are already discussed in 
the gameplay input section. Clearly, our gameplay affects some aspects of the game 
context and vice versa but the two can be viewed as separate entities. Viewing this 
relationship from an analytics lens, the game context can be seen as a form of game 
metrics, opposed to gameplay which is a form of player metrics. 

The importance of the game context for modeling players is obvious. In fact, we 
couid argue that the context of the game during the interaction is a necessary input 
for detecting reliably any cognitive and affective responses of players. It couid also 
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be argued that the game context is necessary as a guide during the annotation of the 
player experience; but more of that we will discuss in Section 5.5 The same way 
that we require the current social and cultural context to better detect the under- 
lying emotional state of a particular facial expression of our discussant any player 
reactions cannot be dissociated from the stimulus (or the game context) that elicited 
them. Naturally, player States are always linked to game context. As a resuit, player 
models that do not take context into account run a risk of inferring erroneous States 
for the player. For example, an increase in galvanic skin response can be linked to 
different high-arousal affective States such as frustration and excitement. It is very 
hard to teli however, what the heightened galvanic skin response “means” without 
knowing what is happening in the game at the moment. In another example, a partic¬ 
ular facial expression of the player, recorded though a camera, could be associated 
with either an achievement in the game or a challenging moment, and needs to be 
triangulated with the curTent game state to be understood. Evidently, such dualities 
of the underlying player state may be detrimental for the design of the player model. 

While a few studies have investigated physiological reactions of players in isola- 
tion, good practice in player modeling commands that any reactions of the play¬ 
ers is triangulated with information about the current game state. For instance, 
the model needs to know if the GSR increases because the player died or com- 
pleted the level. The game context—naturally combined (or fused) with other in- 
put modalities from the player—has been used extensively in the literature for the 
prediction of different affective and cognitive States relevant to playing experience 
Il45l114^152111^ 15721 fT^IB^l558l 1452114^ 


5.4.4 Player Profile 

A player profile includes all the information about the player which is static and it 
is not directly (nor necessarily) linked to gameplay. This may include information 
on player personality (such as expressed by the Five Factor Model of personality 
ifUQll^ l. culture dependent factors, and general demographics such as gender 
and age. A player’s profile may be used as input to the player model to comple- 
ment the captured in-game behavior with general attributes about the player. Such 
information may lead to more precise predictive models about players. 

While gender, age 078711686II . nationality ll46t and player expeitise level ll9^ 
have already proven important factors for profiling players the role of personality 
remains somewhat contentious. On the one hand, the findings of van Lankveld et al. 
0736117371 . for instance, reveal that gameplay behavior does not necessarily corre- 
spond to a player’s behavior beyond the game. On the other hand, Yee et al. have 
identified strong correlations between player choices in World ofWarcraft (Blizzard 
Entertainment, 2004) and the personalities of its players 078811 . Strong correlations 
have also been found between the playing style and personality in the first-person 
shooter Battlefield 3 (Electronic Arts, 2011) 06871 . In general, we need to acknowl- 
edge that there is no guaranteed one-to-one mapping between a player’s in-game 
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behavior and personality, and that a player’s personality profile does not necessarily 
indicate what the player would prefer or like in a game 078211 . 


5.4.5 Linked Data 

Somewhere hetween the highly dynamic in-game hehavior and the static profile In¬ 
formation ahout the player we may also consider linked data retrieved from weh 
Services that are not associated with gameplay per se. This data, for instance, may 
include our social media posts, emoticons, emojis 11991 . tags used, places visited, 
game reviews written, or any relevant semantic Information extracted from diverse 
Weh content. The henefit of adding such information to player models is many- 
fold hut it has so far seen limited use in games ll32]| . In contrast to current player 
modeling approaches the use of massive amounts and dissimilar types of content 
across linked Online sources would enahle the design of player models which are 
hased on user information stored across various Online datasets, therehy realizing 
semantically-enriched game experiences. For example, hoth scores and sentiment- 
analyzed textual reviews 0517115141 from game review sites such as Metacriti(0 or 
GameRanking^ can he used as input to a model. This model can then he used to 
create game content which is expected to appeal to the specific parts of the com- 
munity, hased, for instance, on demographics, skill or interests collected from the 
user’s in-game achievements or favored games 15851 . 


5.5 What Is the ModePs Output Like? 


The modeTs output, i.e., that which we wish to model, is usually a representation of 
the player’s state. In this section we explore three options for the output of the model 
that serve different purposes in player modeling. If we wish to model the experi- 
euce of the player the output is provided predominately through manual annotation. 
If instead we wish to model aspects of player behavior the output is predominately 


hased on in-game actions (see Fig. 5.11. Finally, it may very well he that the model 
has no output. Section [5.5.1| and Section [5.5.2| discuss the particularities of the out¬ 
put, respectively, for the purpose of hehavioral modeling and experience modeling 
whereas Section[5.5.3|explores the condition where the model has no outputs. 


^ http://www.metacritic.com 
'* http://www.gamerankmgs.com/ 
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5.5.1 Modeling Behavior 

The task of modeling player behavior refers to the prediction or imitation of a partic- 
ular behavioral state or a set of States. Note that if no target outputs are available then 
we are faced with either an unsupervised learning problem or a reinforcement learn- 
ing problem which we discuss in Section [5.5.3| The output we must learn to predict 
(or imitate) in a supervised learning manner can be of two major types of game- 
play data: either micro-actions or macro-actions (see Fig. |5.1| i. The first machine 
learning problem considers the moment-to-moment game state and player action 
space that are available at a frequency of frame rates. For example, we can learn 
to imitate the moves of a player on a frame-to-frame basis by comparing the play 
traces of an AI agent and a human as e.g., done for Super Mario Bros (Nintendo, 
1985) 11511114691 . When macro-actions are considered instead, the target output is 
normally an aggregated feature of player behavior over time, or a behavioral pat- 
tern. Examples of such outputs include game completion times, win rates, churn, 
trajectories, and game balance. 


5.5.2 Modeling Experience 

To model the experience of the player one needs to have access to labeis of that 
experience. Those labeis ideally need to be as close to the ground truth of expe¬ 
rience as possible. The ground truth (or gold Standard) in affective Sciences refers 
to a hypothesized and unknown label, value, or function, that best characterizes and 
represents an affective construet or an experience. Labeis are normally provided 
through manual annotation which is a rather laborious process. Manual annotation 
is however necessary given that we require some estimate of the ground truth for 
subjective notions such as the emotional States of the player. The accuracy of that 
estimation is regularly questioned as there are numerous factors contributing to a 
deviation between a label and the actual underlying player experience. 

Manually annotating players and their gameplay is a challenge in its own right 
with respect to both the human annotators involved and the annotation protocol cho- 
sen 11455117771 . On one hand, the annotators need to be skilled enough to be able to 
approximate the actual experience well. On the other hand, there are stili many open 
questions left for us to address when it comes to the annotation tools and protocols 
used. Such questions include: Who will do the labeling: the person experiencing 
the gameplay or others? Will the labeling of player experience involve States (dis¬ 
crete representation) or instead involve the use of intensity or experience dimensions 
(continuous representation)? When it comes to time, should it be done in real-time 
or offline, in discrete time periods or continuously? Should the annotators be asked 
to rate the affect in an absolute fashion or, instead, rank it in a relative fashion? 
Answers to the above questions yield different data annotation protocols and, in- 
evitably, varying degrees of data quality, validity and reliability. In the following 
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sections we attempt to address a number of such critical questions that are usually 
raised in subjective annotations of player States. 


5.5.2.1 Free Response or Forced Response? 

Subjective player state annotations can be based either on a player’s free response— 
retrieved via e.g., a think-aloud protocol 15551 —or on forced responses retrieved 
through questionnaires or annotation tools. Free response naturally contains richer 
information about the player’s state, but it is often unstructured, even chaotic, and 
thus hard to analyze appropriately. On the other hand, forcing players to self-report 
their experiences using directed questions or tasks constrains them to specihc ques- 
tionnaire items which could vary from simple tick boxes to multiple choice items. 
Both the questions and the answers we provide to annotators may vary from single 
words to sentences. Questionnaires can contain elements of player experience (e.g., 
the Game Experience Questionnaire 12861 1. demographic data and/or personality 
traits (e.g., a validated psychological prohling questionnaire such as the NEO-PI-R 
II140I 1. In the remainder of this section we will focus on forced responses as these 
are easier to analyze and are far more appropriate for data analysis and player mod- 
eling (as dehned in this book). 


5.5.2.2 Who Annotates? 

Given the subjective nature of player experience the hrst natural question that comes 
in mind is who annotates players! In other words, who has the right authority and 
the best capacity to provide us with reliable tags of player experience? We dis- 
tinguish two main categories; annotations can either be self-reports or reports ex- 
pressed indirectly by experts or external observers II783L 

In the hrst category the player States are provided by the players themselves and 
we call that flrst-person annotation. Eor example, a player is asked to rate the level 
of engagement while watching her playthrough video. Eirst-person is clearly the 
most direct way to annotate a player state and build a model based on the solicited 
annotations. We can only assume there is disparity between the true (inner) expe¬ 
rience of each player and the experience as felt by herself or perceived by others. 
Based on this assumption the player’s annotations should normally be closer to her 
inner experience (ground truth) compared to third-person annotation. Eirst-person 
annotation, however, may suffer from self-deception and memory limitations Il778l . 
These limitations have been attributed mainly to the discrepancies between “the ex- 
periencing self” and “the remembering self” of a person 13181 which is also known 
as the memory-experience gap M462I . 

Expeit annotators—as a response to the above limitations—may instead be able 
to surpass the perception of experience and reach out to the inner experience of 
the player. In this second annotation category, named third-person annotation, an 
expert—such as a game designer—or an extemal observer provides the player state 
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in a more objective manner, thereby reducing the subjective biases of first-person 
perceptions. For instance, a user experience analyst may provide particular player 
state tags while observing a first-person shooter deathmatch game. The benefit of 
third-person annotation is that multiple annotators can be used for a particular game- 
play experience. In fact, the availability of several such subjective perceptions of 
experience may allow us to approximate the ground truth better as the agreement 
between many annotators enhances the validity of our data directly. A potential 
disagreement, on the other hand, might suggest that the gameplay experience we 
examine is non-trivial or may indicate that some of our annotators are untrained or 
inexperienced. 


5.5.2.3 How Is Player Experience Represented? 


Another key question is how player experience is best represented: as a number of 
different States (discrete) or, alternatively, as a set of dimensions (continuous)? On 
one hand, discrete labeling is practical as a means of representing player experience 
since the labeis can easily form individual items (e.g., “excited”, “annoyed” etc.) in 
an experience questionnaire, making it easy to ask the annotator/player to pick one 
(e.g., in fl621ll i. Continuous labeling, on the other hand, appears to be advantageous 
for two key reasons. First, experiential States such as immersion are hard to capture 
with words or linguistic expressions that have fuzzy boundaries. Second, States do 
not allow for variations in experience intensity over time since they are binary: ei- 
ther the state is present or not. For example, the complex notions of fun, or even 
engagement, cannot be easily captured by their corresponding linguistic represen- 
tation in a questionnaire or define well a particular state of a playing experience. 
Instead it seems natural to represent them as a continuum of experience intensity 
that may vary over time. For these reasons we often observe low agreement among 
the annotators II143I when we represent playing experience via discrete States. 

As discussed earlier, the dominant approach in continuous annotation is the use 
of RusselTs two-dimensional (arousal-valence) circumplex model of affect Il58n 
(see Fig. 5.2(a)| l. Figure [53] illustrates two different annotation tools (FeelTrace and 
AjfectRank) that are based on the arousal-valence circumplex model of affect. Figure 


5.6 depicts the RankTrace continuous annotation tool which can be used for the 


annotation of a single dimension of affect (i.e., tension in this example). Ali three 
tools are accessible and of direct use for annotating playing experience. 


5.5.2.4 How Often to Annotate? 


Annotation can happen either within particular time intervals or continuously. Time- 
continuous annotation has been popularized due to the existence of freely available 
tools such as FeelTrace 11441 (see Fig. 5.5(c)| l and GTrace 01451 . which allows for 
continuous annotation of content (mostly videos and speech) across the dimensions 
of arousal and valence. In addition to FeelTrace there are annotation tools like the 
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continuous measurement system 145411 and EmuJoy M474L where the latter is de- 
signed for the annotation of music content. User interfaces such as wheels and 
knobs linked to the above annotation tools show further promise for the continu¬ 
ous annotation of experience in games 0125II3971IW1 (see Fig. 5.6 1 . The continuous 
annotation process, however, appears to require a higher amount of cognitive load 
compared to a time-discrete annotation protocol. Higher cognitive loads often resuit 
in lower levels of agreement between different annotators and yield unreliable data 
for modeling player experience 0166114181 . 

As a response to the above limitations, time-discrete annotation provides data at 
particular intervals when the annotator feels there is a change in the player’s state. 
And changes are best indicated relatively rather than absolutely. AffectRank, for 
instance (see Fig. 5.5(b)| l, is a discrete, rank-based annotation tool that can be used 
for the annotation of any type of content including images, video, text or speech and 
it provides annotations that are significantly more reliable (with respect to inter-rater 
agreement) than the annotations obtained from continuous annotation tools such as 
FeelTrace ||777| . The rank-based design of AjfectRank is motivated by observations 
of recent studies in third-person video annotation indicating that “... humans are 
better at rating emotions in relative rather than absolute terms. ” |l455l|772l. Further, 
AjfectRank is grounded in numerous hndings showcasing the supremacy of ranks 
over ratings for obtaining annotations of lower inconsistency and order effects liTTTi 
177^17781143611455117611 . 

A recent tool that builds on the relative-based annotation of AjfectRank and al- 
lows for the annotation of affect in a continuous yet unbounded fashion is RankTrace 
(see Fig. 5.6 1 . The core idea behind RankTrace is introduced in 01251 : the tool asks 
participants to watch the recorded playthrough of a play session and annotate in 
real-time the perceived intensity of a single emotional dimension. The annotation 
process in RankTrace is controlled through a “wheel-like” hardware, allowing par¬ 
ticipants to meticulously increase or decrease emotional intensity by turning the 
wheel, similarly to how volume is controlled in a stereo system. Further, the general 
interfacing design of RankTrace builds on the one-dimensional GTrace annotation 
tool 014511 . Unlike other continuous annotation tools, however, annotation in Rank¬ 
Trace is unbounded: participants can continuously increase or decrease the intensity 
as desired without constraining themselves to an absolute scale. This design deci- 
sion is built on the anchor 06071 and adaptation level 025 81 psychology theories by 
which affect is a temporal notion based on earlier experience that is best expressed 
in relative terms 076511777113971 . The use of RankTrace has revealed the benefits 
of relative and unbounded annotation for modeling affect more reliably 03971 and 
has also showed promise for the construction of general models of player emotion 
across games iii. 


5.5.2.5 When to Annotate? 

When is it best to annotate experience: before, during or after play (see Fig. IU)? 
In a pre-experience questionnaire we usually ask annotators to set the baseline of 
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(a) Annotating facial expressions of players during play. The annotation is context-dependent as 
the video includes the gameplay of the player (see top left comer). 
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(b) AjfectRank'. A time-discrete annotation tool (c) FeelTrace. A time-continuous annotation 
for arousal and valence. tool for arousal and valence. 


Fig. 5.5 An example of third-person annotation based on videos of players and their gameplay 
using either (a) the AffectRank (b) or the FeelTrace (c) annotation tool. AjfectRank is freely avail- 
able at: https://github.com/TAPeri/AffectRank. FeelTrace is freely available at: http://emotion- 
research.net/download/Feeltrace%20Package.zip. 


a player’s state prior to playing a game. This state can be influenced by a number 
of factors such as the mood of the day, the social network activity, the caffeine 
consumption, earlier playing activity and so on. This is a wealth of information that 
can be used to enrich our models. Again, what is worth detecting is the relative 
change 076511 from the baseline state of the user prior playing to the game. 
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Fig. 5.6 The RankTrace annotation tool. In this example the tool is used for the annotation of 
tension in horror games. Participants play a game and then they annotate the level of tension by 
watching a video-recorded playthrough of their game session (top of image). The annotation trace 
is controlled via a wheel-like user interface. The entire annotation trace is shown for the partici- 
pant’s own reference (bottom of image). Image adapted from (3921 . The RankTrace tool is available 
at: http://www.autogamedesign.eu/software. 


A during-experience protocol, on the other hand, may involve the player in a 
first-person think-aloud setup M555II or a third-person annotation design. For the 
latter protocol you may think of user experience experts that observe and annotate 
player experience during the beta release of a game, for example. As mentioned 
earlier, first-person annotation during play is a rather intrusive process that disrupts 
the gameplay and risks adding experimental noise to annotation data. In contrast, 
third-person annotation is not intrusive; however, there are expected deviations from 
the actual first-person experience, which is inaccessible to the observer. 

The most popular approach for the annotation of player experience is after a game 
(or a series of games) has been played, in a post-experience fashion. Post-experience 
annotation is unobtrusive for the player and it is usually performed by the players 
themselves. Self-reports, however, are memory-dependent by nature and memory is, 
in tum, time-dependent. Thus, one needs to consider carefully the time window be- 
tween the experience and its corresponding report. For the reported post-experience 
to be a good approximation of the actual experience the playing time window needs 
to be small in order to minimize memory biases, yet sufficiently large to elicit par- 
ticular experiences to the player. The higher the cognitive load required to retrieve 
the gameplay context is, the more the reports are memory-biased and not relevant to 
the actual experience. Further, the longer the time window between the real experi¬ 
ence and the self-report the more the annotator activates aspects of episodic memory 
associated with the gameplay imi. Episodic memory traces that form the basis of 
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self-reports fade over time, but the precise rate at which this memory decay occurs 
is unknown and most likely individual 057111 . Ideally, memory decay is so slow that 
the annotator will have a ciear feeling of the gameplay session when annotating it. 
Now, if the time window becomes substantial—on the scale of hours and days—the 
annotator has to activate aspects of semantic memory such as general beliefs about 
a game. In summary, the more the episodic memory, and even more so the seman¬ 
tic memory, are activated during annotation, the more systematic errors are induced 
within the annotation data. 

As a general rule of thumb the longer it takes for us to evaluate an experience 
of ours the larger the discrepancy between the true experience and the evaluation of 
the experience, which is usually more intense than the true experience. It also seems 
that this gap between our memory of experience and our real experience is more 
prominent when we report unpleasant emotions such as anger, sadness and tension 
rather than positive emotions 04621 . Another bias that affects how we report our 
experience is the experience felt near the end of a session, a game level or a game; 
this effect has been named peak-end rule Il462l . 

An effective way to assist episodic memory and minimize post-experience cogni¬ 
tive load is to Show annotators replay videos of their gameplay (or the gameplay of 
others) and ask them to annotate those. This can be achieved via crowdsourcing 09^ 
in a third-person manner or on a first-person annotation basis 0125ll271ll3971l97l . 
Another notable approach in this direction is the data-driven retrospective inter- 
viewing method 01871 . According to that method player behavioral data is collected 
and is analyzed to drive the construction of interview questions. These questions are 
then used in retrospect (post-experience) to reflect on the annotator’s behavior. 


5.5.2.6 Which Annotation Type? 


We often are uncertain about the type of labeis we wish to assign to a player state or 
a player experience. In particular, we can select from three data types for our annota¬ 


tion; ratings, classes, and ranks (see Fig. 5.11. The rating-based format represents a 


player’s state with a scalar value or a vector of values. Ratings are arguably the dom- 
inant practice for quantitatively assessing aspects of a user’s behavior, experience, 
opinion or emotion. In fact, the vast majority of user and psychometric studies have 
adopted rating questionnaires to capture the opinions, preferences and perceived ex- 
periences of experiment participants—see llT^ 14421II191 among many. The most 
popular rating-based questionnaire follows the principies of a Likert scale 113841 in 
which users are asked to specify their level of agreement with (or disagreement 
against) a given statement—see Fig. |5.7(a) for an example. Other popular rating- 
based questionnaires for user and player experience annotation include the Geneva 
Wheel model 16001 . the Self-Assessment Manikin 04681 . the Positive and Negative 
Affect Schedule 16461 . the Game Experience Questionnaire 12861 . the Flow State 
Scale 12931 and the Player Experience of Need Satisfaction (PENS) survey 05831 . 
which was developed based on self-determination theory 01621 . 
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Rating-based repoiting has notable inherent limitations that are often over- 
looked, resulting in fundamentally flawed analyses 0778112980 . First, ratings are 
analyzed traditionally by comparing their values across participants; see 023311427ll 
among many. While this is a generally accepted and dominant practice it neglects 
the existence of inter-personal differences as the meaning of each level on a rat- 
ing scale may differ across experiment participants. For example, two participants 
assessing the difficulty of a level may assess it as exactly the same difficulty, but 
then one rates it as “very easy to play” and the other as “extremely easy to play”. 
It turns out that there are numerous factors that contribute to the different internal 
rating scales existent across participants 04551 such as differences in personality, 
culture 06431 . temperament and interests EQ). Further, a large volume of studies 
has also identified the presence of primacy and recency order effects in rating-based 
questionnaires (e.g., 0113117731 ). systematic biases towards parts of the scale 03880 
(e.g., right-handed participants may tend to use the right side of the scale) or a 
fixed tendency over time (e.g., on a series of experimental conditions, the last ones 
are rated higher). Indicatively, the comparative study of 07730 between ratings and 
ranks showcases higher inconsistency effects and significant order (recency) effects 
existent in ratings. 

In addition to inter-personal differences, a critical limitation arises when ratings 
are treated as interval values since ratings are by nature ordinal values 0657112981 . 
Strictly speaking, any approach or method that treats ratings as numbers by, for 
instance, averaging their ordinal labeis is fundamentally flawed. In most question¬ 
naires Likert items are represented as pictures (e.g., different representations of 
arousal in the Self-Assessment Manikin 04681 ) or as adjectives (e.g., “moderately”, 
“fairiy” and “extremely”). These labeis (images or adjectives) are often erroneously 
converted to integer numbers, violating basic axioms of statistics which suggest that 
ordinal values cannot be treated as interval values 06570 since the underlying nu- 
merical scale is unknown. Note that even when a questionnaire features ratings as 
numbers (e.g., see Fig. 5.7(a)i, the scale is stili ordinal as the numbers in the in- 
strument represent labeis. Thus, the underlying numerical scale is stili unknown and 
dependent on the participant 065711515113610 . Treating ratings as interval values is 
grounded in the assumption that the difference between consecutive ratings is fixed 
and equal. However, there is no valid assumption suggesting that a subjective rating 
scale is linear 1129811 . For instance, the difference between “fairiy (4) ” and “ex¬ 
tremely (5)” may be larger than the distance between “moderately (3)” and “fairiy 
(4)” as some experiment participants rarely use the extremes of the scale or tend to 
use one extreme more than the other 03610 . If, instead, ratings are treated naturally 
as ordinal data no assumptions are made about the distance between rating labeis, 
which eliminates introducing data noise to the analysis. 

The second data type for the annotation of players is the class-based format. 
Classes allow annotators to select from a finite and non-structured set of options 
and, thus, a class-based questionnaire provides nominal data among two (binary) or 
more options. The questionnaire asks subjects to pick a player state from a particu- 
lar representation which could vary from a simple boolean question {was that game 
level frustrating or not? is this a sad facial expression? which level was the most 
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stressful?) to a player state selection from, for instance, the circumplex model of 
affect {is this a high- or a low-arousal game state for the player?). The limitations 
of ratings are mitigated, in part, via the use of class-based questionnaires. By not 
providing information about the intensity of each player state, however, classes do 
not have the level of granularity ratings naturally have. A class-based questionnaire 
might also yield annotations with an unbalanced number of samples per class. A 
common practice in psychometrics consists of transforming sets of consecutive rat¬ 
ings into separate classes (e.g., see 0226112601 among many). In an example study 
esa, arousal ratings on a 7-point scale are transformed into high, neutral and low 
arousal classes using 7-5, 4 and 3-1 ratings, respectively. While doing so might seem 
appropriate, the ordinal relation among classes is not being taken into account. More 
importantly, the transformation process adds a new set of hias to the subjectivity hias 
of ratings, namely class splitting criteria 04360 . 

Finally, rank-based questionnaires ask the annotator to rank a preference among 
options such as two or more sessions of the game 07631 . In its simplest form, the 
annotator compares two options and specifies which one is preferred under a given 
statement (pairwise preference). With more than two options, the participants are 
asked to provide a ranking of some or all the options. Examples of rank-based ques- 
tions include; was that level more engaging than this level? which facial expres- 
sion looks happier?). Another example of a rank-based questionnaire (4-alternative 
forced choice) is illustrated in Fig. 5.7(b) Being a form of subjective reporting, 
rank-based questionnaires (as much as rating-based and class-based questionnaires) 
are associated with the well known limitations of memory biases and self-deception. 
Reporting about subjective constructs such as experience, preference or emotion via 
rank-based questionnaires, however, has recently attracted the interest of researchers 
in marketing IMI, psychology ca, User modeling 117611 [JTll and affective com- 
puting 117651 17211 14361 l455l 17731 among other fields. This gradual paradigm shift is 
driven by both the reported benefits of ranks minimizing the effects of self-reporting 
subjectivity biases and recent findings demonstrating the advantages of ordinal an- 
notation 076511773114551 . 


53 . 2.1 What Is the Value of Player Experience? 

Describing, labeling and assigning values to subjective notions, such as player ex¬ 
perience, is a non-trivial task as evidenced by a number of disciplines including 
neuroscience 06071 , psychology 02580 , economies 06301 , and artihcial intelligence 
03150 . Annotators can attempt to assign numbers to such notions in an absolute 
manner, using for instance a rating scale. Annotators can alternatively assign val¬ 
ues in a relative fashion, using for instance a ranking. There are, however, a mul- 
titude of theoretical and practical reasons to doubt that subjective notions can be 
encoded as numbers in the first place 07651 . For instance, according to Kahneman 
EHl, co-founder of behavioral economies, “...it is safe to assume that changes are 
more accessible than absolute values”; his theory about judgment heuristies is built 
on Herbert Simon’s psychology ofbounded rationality O630l . Further, an important 
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X is frustrating 

Disagree Agree 

I-1-1-1-1 

- 2-1012 


Is X or Y more frustrating? 

□ X Dy 

□ Both are equally frustrating 
r~l Neither is frustrating 


(a) Rating: A 5-point Likert item example (b) Rank: A 4-altemative forced choice exam- 

ple 

Fig. 5.7 Examples of rating-based (a) vs. rank-based (b) questionnaires. 


thesis in psychology, named adaptation level theory 12581 . suggests that humans 
lack the ability to maintain a constant value about subjective notions and their pref- 
erences about options are, instead, made on a pairwise comparison basis using an 
intemal ordinal scale II460I . The thesis claims that while we are efficient at discrimi- 
nating among options, we are not good at assigning accurate absolute values for the 
intensity of what we perceive. For example, we are particularly bad at assigning ab¬ 
solute values to tension, frequency and loudness of sounds, the brightness of an im- 
age, or the arousal level of a video. The above theories have also been supported by 
neuroscientific evidence suggesting that experience with stimuli gradually creates 
our own intemal context, or anchor 06071 . against which we rank any forthcoming 
stimulus or perceived experience. Thus, our choice about an option is driven by our 
intemal ordinal representation of that particular option within a sample of options; 
not by any absolute value of that option 06580 . 

As a remote observation, one may argue that the relative assessment provides 
less information than the absolute assessment since it does not express a quantity 
explicitly and only provides ordinal relations. As argued earlier, however, any addi- 
tional information obtained in an absolute fashion (e.g., when ratings are treated as 
numbers) violates basic axioms of applied statistics. Thus the value of the additional 
information obtained (if any) is questioned directly 07651 . 

In summary, results across different domains investigating subjective assessment 
suggest that relative (rank-based) annotations minimize the assumptions made about 
experiment participants’ notions of highly subjective constructs such as player ex¬ 
perience. Further, annotating experience in a relative fashion, instead of an absolute 
fashion, leads to the construction of more generalizable and accurate computational 
models of experience 0765114361 . 


5.5.3 No Output 

Very often we are faced with datasets where target outputs about player behavioral 
or experience States are not available. In such instances modeling of players must 
relv on unsunervised learnine 017611244111780 fsee Fis. |53- Unsupervised learning. 
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as discussed in Chapter]^ focuses on fitting a model to observations by discovering 
associations of the input and without having access to a target output. The input is 
generally treated as a set of random variables and a model is built through the obser¬ 
vations of associations among the input vectors. Unsupervised learning as applied to 
modeling players involves tasks such as clustering and association mining which 
are described in Section 15.6.31 

It may also be the case that we do not have target outputs available but, neverthe- 
less, we can design a reward function that characterizes behavioral or experiential 
patterns of play. In such instances we can use reinforcement learning approaches 
to discover policies about player behavior or player experience based on in-game 
play traces or other state-action representations (see Section 5.6. 2| l. In the following 
section we detail the approaches used for modeling players in a supervised learning, 
reinforcement learning and unsupervised learning fashion. 


5.6 How Can We Model Players? 


In this section we build upon the data-driven approach of player modeling and dis- 
cuss the application of supervised, reinforcement and unsupervised learning to 
model players, their behavior and their experience. To showcase the difference be- 
tween the three learning approaches let us suppose we wish to classify player be¬ 
havior. We can only use unsupervised learning if no behavioral classes have been 
defined a priori 01761 . We can instead use supervised learning if, for example, we 
have already obtained an initial classification of players (either manually or via clus¬ 
tering) and we wish to ht new players into these predehned classes ifTTSl . Finally, 
we can use reinforcement learning to derive policies that imitate different types 


of playing behavior or style. In Section 5.6.1 we focus on the supervised learning 
paradigm whereas in Section 5.6.2| and Section 5.6.3 we outline, respectively, the re¬ 
inforcement learning and the unsupervised learning approach for modeling players. 
All three machine learning approaches are discussed in Chapter]^ 


5.6.1 Supervised Learning 

Player modeling consists of finding a function that maps a set of measurable at- 
tributes of the player to a particular player state. Following the supervised learning 
approach this is achieved by machine learning, or automatically adjusting, the pa- 
rameters of a model to fit a dataset that contains a set of input samples, each one 
paired with target outputs. The input samples correspond to the list of measurable 
attributes (or features) while the target outputs correspond to the annotations of the 
player’s States for each of the input samples that we are interested to learn to pre- 
dict. As mentioned already, the annotations may vary from behavioral characteris- 
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tics, such as completion times of a level or player archetypes, to estimates of player 
experience, such as player frustration. 

As we saw in Chapter]^ popular supervised learning techniques, including arti- 
ficial neural networks (shallow or deep architectures), decision trees, and support 
vector machines, can be used in games for the analysis, the imitation and the pre- 
diction of player behavior, and the modeling of playing experience. The data type 
of the annotation determines the output of the model and, in turn, the type of the 
machine learning approach that can be applied. The three supervised learning alter- 
natives for learning from numerical (or interval), nominal and ordinal annotations— 
respectively, regression, classification and preference learning —are discussed in 
this section. 


5.6.1.1 Regression 

When the outputs that a player model needs to approximate are interval values, the 
modeling problem is known as metric or Standard regression. Any regression algo- 
rithm is applicable to the task, including linear or polynomial regression, artihcial 
neural networks and support vector machines. We refer the reader to Chapter|^for 
details on a number of popular regression algorithms. 

Regression algorithms are appropriate for imitation and prediction tasks of 
player behavior. When the task, however, is modeling of player experience cau- 
tion needs to be put on the data analysis. While it is possible, for instance, to use 
regression algorithms to learn the exact numeric ratings of experience, in general it 
should be avoided because regression methods assume that the target values follow 
an interval (numerical) scale. Ratings naturally dehne an ordinal scale Il765ll773ll in- 
stead. As mentioned already, ordinal scales such as ratings should not be converted 
to numerical values due to the subjectivity inherent to reports, which imposes a 
non-uniform and varying distance among questionnaire items 0778116571 . Prediction 
models trained to approximate a real-value representation of a rating—even though 
they may achieve high prediction accuracies—do not necessarily capture the true 
reported playing experience because the ground truth used for training and valida- 
tion of the model has been undermined by the numerous effects discussed above. 
We argue that the known fundamental pitfalls of self-reporting outlined above pro¬ 
vide sufficient evidence against the use of regression for player experience modeling 
GMlEni 5531 ESU. Thus, we leave the evaluation of regression methods on expe¬ 
rience annotations outside the scope of this book. 


5.6.1.2 Classification 

Classification is the appropriate fotm of supervised learning for player modeling 
when the annotation values represent a hnite and non-structured set of classes. 
Classihcation methods can infer a mapping between those classes and player at- 
tributes. Available algorithms include artihcial neural networks, decision trees, ran- 
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dom forests, suppoit vector machines, /T-nearest neighbors, and ensemble leaming 
among many otbers. Furtber details about some of tbese algoritbms can be found in 
Cbaptei'!^ 

Classes can represent playing bebavior wbicb needs to be imitated or predicted, 
sucb as completion times (e.g., expressed as low, average or bigb completion time) 
or User retention in a free-to-play game (e.g., expressed as weak, mild or strong 
retention). Classes can alternatively represent player experience sucb as an excited 
versus afrustrated player as manifested from facial expressions or low, neutral and 
high arousal States for a player. 

Classification is perfectly suited for tbe task of modeling player experience if 
discrete annotations of experience are selected from a list of possibilities and pro- 
vided as target outputs 0153113441 . In otber words, annotations of player experience 
need to be nominal for classification to be applied. A common practice, bowever, as 
already mentioned in Section [5.5.2.6[ is to treat ratings of experience as classes and 
transform tbe ordinal scale—tbat defines ratings—into a nominal scale of separate 
classes. For example, ratings of arousal tbat lie between —1 and 1 are transformed 
into low, neutral and bigb arousal classes. By classifying ratings not only tbe ordinal 
relation among tbe introduced classes is ignored but, most importantly, tbe transfor- 
mation process induces several biases to tbe data (see Section [5.5.2.6| l. Tbese biases 
appear to be detrimental and mislead tbe searcb towards tbe ground trutb of player 
experience 0765114361 . 


5.6.1.3 Preference Learning 

As an alternative to regression and classification methods, preference learning 
Eia methods are designed to learn from ordinal data sucb as ranks or preferences. 
It is important to note tbat tbe training signal in tbe preference learning paradigm 
merely provides information for tbe relative relation between instances of tbe phe- 
nomenon we attempt to approximate. Target outputs tbat follow an ordinal scale do 
not provide information about tbe intensity (regression) or the clusters (classifica¬ 
tion) of the phenomenon. 

Generally we could construet a player model based on in-game behavioral pref¬ 
erences. The information tbat this player, for example, prefers the mini-gun over a 
number of other weapons could fotm a set of pairwise preferences we could learn 
from. Alternatively we can build a model based on experience preferences. A player, 
for instance, reported tbat area X of the level is more challenging than area Y of the 
same level. Based on a set of sucb pairwise preferences we can derive a global func- 
tion of challenge for tbat player. 

As outlined in Chapter a large palette of algoritbms is available for the task 
of preference learning. Many popular classification and regression techniques have 
been adapted to tackle preference learning tasks, including linear statistical models 
such as linear discriminant analysis and large margins, and non-linear approaches 
such as Gaussian processes II122I . deep and shallow artificial neural networks 04301 . 
and suppoit vector machines 03021 . 
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Preference learning has already been extensively applied to modeling aspects of 
players. For example, Martmez et al. 0430114311 and Yannakakis et al. 0780117710 
have explored several artificial neural network approaches to learn to predict af¬ 
fective and cognitive States of players reported as pairwise preferences. Similarly, 
Garbarino et al. 02171 have used linear discriminant analysis to learn pairwise en- 
joyment predictors in racing games. To facilitate the use of proper machine learning 
methods on preference learning problems, a number of such preference learning 
methods as well as data preprocessing and feature selection algorithms have been 
made available as part of the Preference Learnii^ Toolbox (PLT) 01981 . PLT is an 
open-access, user-friendly and accessible toolkijjbuilt and constantly updated for 
the purpose of easing the processing of (and promoting the use of) ranks. 

As ratings, by definition, express ordinal scales they can directly be transposed to 
any ordinal representation (e.g., pairwise preferences). For instance, given an anno¬ 
tator’s rating indicating that a condition A felt ‘slightly frustrating’ and a condition B 
felt ‘very frustrating’, a preference learning method can train a model that predicts a 
higher level of frustration for B than for A. In this way the modeling approach avoids 
introducing artifacts of what is the actual difference between ‘very’ and ‘slightly’ or 
the usage of the scale for this particular annotator. Further, the limitation of different 
subjective scales across users can be safely bypassed by transforming rating reports 
into ordinal relations on a per-annotator basis. Finally, the problem of the scale 
varying across time due to episodic memory stili persists but can be minimized by 
transforming only consecutive reports, i.e., given a report for three conditions A, B 
and C, the player model can be trained using only the relation between A and B, and 
B and C (dropping the comparison between A and C). 


5.6.1.4 Summary: The Good, the Bad and the Ugly 

The last section on supervised learning is dedicated to the comparison among the 
three methods—regression, classification and preference learning—for modeling 
players. Arguably the discussion is limited when the in-game behavior of players 
is imitated or predicted. If behavioral data about players follows interval, nomi- 
nal or ordinal scales then naturally, regression, classification and preference learn¬ 
ing should be applied, respectively. Behavioral data have an objective nature which 
makes the task of learning less challenging. Given the subjective notion of player ex- 
perience, however, there are a number of caveats and limitations of each algorithm 
that need to be taken into account. Below we discuss the comparative advantages 
of each and we summarize the key outcomes of supervised learning as applied to 
modeling the experience of players. 

Regression vs. Preference Learning: Motivated by psychological studies sug- 
gesting that interval ratings misrepresent experience 051511455113611 . we will not 
dedicate ourselves an extensive comparison between preference learning and regres¬ 
sion methods. The performance comparison between a regression and preference- 


^ http://sourceforge.net/projects/pl-toolbox/ 
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learned model is also irrelevant as the former is arguably a priori incapable of cap- 
turing the underlying experience phenomenon as precisely as the latter. Such de- 
viations from the ground truth, however, are not trivial to illustrate through a data 
modeling approach and thus the comparison is not straightforward. The main rea- 
son is that the objective ground truth is fundamentally ill-dehned when numbers are 
used to characterize subjective notions such as player experience. 

Regression vs. Classification: Classes are easy to analyze and create player 
models from. Further, their use eliminates part of the inter-personal biases intro- 
duced with ratings. For these reasons classihcation should be preferred to regres¬ 
sion for player experience modeling. We already saw that classihcation, instead of 
regression, is applied when ratings are available to overcome part of the limitations 
inherent in rating-based annotation. For instance, this can be achieved by transform- 
ing arousal ratings to high, neutral and low arousal classes II255I . While this com- 
mon practice in psychometrics eliminates part of the rating subjectivity it adds new 
forms of data biases inherent in the ad-hoc decisions to split the classes. Further, 
the analysis of player models across several case studies in the literature has already 
shown that transforming ratings into classes creates a more complicated machine 
learning problem 07651l436ll . 

Classification vs. Preference Learning: Preference learning is the supreme 
method for modeling experience when ranks or pairwise preferences are available. 
Even when ratings or classes are available comparisons between classification and 
preference learning player models in the literature suggest that preference learn¬ 
ing methods lead to more efficient, generic and robust models which capture more 
Information about the ground truth II765I . Indicatively, Crammer and Signer MI471 
compare classification, regression and preference learning training algorithms in a 
task to learn ratings. They report the supremacy of preference learning over the 
other methods based on several synthetic datasets and a movie-ratings dataset. In 
addition, extensive evidence already shows that preference learning better approxi- 
mates the underlying function between input (e.g., experience manifestations such 
as gameplay) and output (e.g., annotations) I436II . Figure [57^ showcases how much 
closer a preference learned model can reach a hypothesized (artificial) ground truth, 
compared to a classification model trained on an artificial dataset. In summary, pref¬ 
erence learning via rank-based annotation Controls for reporting memory effects, 
eliminates subjectivity biases and builds models that are closer to the ground truth 
of player experience M778117771 . 

Grounded in extensive evidence our final note for the selection of a supervised 
learning approach for modeling player experience is ciear: Independently of how 
experience is annotated we argue that preference learning {the good) is a superior 
supervised learning method for the task at hand, classification (the ugly) provides a 
good balance between simplicity and approximation of the ground truth of player 
experience whereas regression {the bad) is based on rating annotations which are of 
questionable quality with respect to their relevance to the true experience. 
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(a) GroundTruth 



(b) Classification (c) Preference learning 


Fig. 5.8 A hypothesized (artificiai) ground truth function (z-axis) which is dependent on two player 
attributes, xi and X 2 (Fig.|5.8(a)} , the best classification model (Fig. |5.8(b^ and the best preference 
leamed model (Fig. |5.8(c)^ . AU images are retrieved from (436). 


5.6.2 Reinforcement Learning 

While it is possible to use reinforcement learning to model aspects of users dur- 
ing their interaction the RL approach for modeling players has been tried mostly in 
comparatively simplistic and abstract games 119511 and has not seen much applica- 
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tion in computer games. The key motivation for the use of RL for modeling players 
is that it can capture the relative valuation of game States as encoded intemally by 
humans during play 16851 . At first glance, player modeling with RL may seem to be 
an application of RL for game playing, and we discuss this in Chapter as part of 
the play for experience aim. In this section, we instead discuss this approach from 
the perspective that a policy learned via RL can capture intemal player States with 
no corresponding absolute target values such as decision making, learnability, cog¬ 
nitive skills or emotive patterns. Further, those policies can be trained on player data 
such as play fraces. The derived player model depicts psychometrically-valid, ab- 
stract simulations of a human player’s internal cognitive or affective processes. The 
model can be used directly to interpret human play, or indirectly, it can be featured 
in AI agents which can be used as playtesting bots during the game design process, 
as baselines for adapting agents to mimic classes of human players, or as believable, 
human-like opponents 02681 . 

Using the RL paradigm, we can construet player models via RL if a reward sig- 
nal can adjust a set of parameters that characterize the player. The reward signal 
can be based either directly on the in-game behavior of the player—for instance, the 
decision taken at a particular game state—or indirectly on player annotations (e.g., 
annotated excitement throughout the level) or objective data (e.g., physiological in- 
dexes showing player stress). In other words, the immediate reward function can be 
based on gameplay data if the model wishes to predict the behavior of the player or, 
instead, be based on any objective measure or subjective report if the model attempts 
to predict the experience of the player. The representation of the RL approach can 
be anything from a Standard Q table that, for instance, models the decision making 
behavior of a player (e.g., as in 0268116851 ) to an ANN that e.g., models in-game 
behavior (as in 026711 ). to a set of behavior Scripts 16501 that are adjusted to imitate 
gameplay behavior via RL, to a deep Q network. 

We can view two ways of constructing models of players via RL: models can 
be built offline (i.e., before gameplay starts) or at runtime (i.e., during play). We 
can also envision hybrid approaches by which models are hrst built offline and then 
are polished at runtime. Offline RL-based modeling adds value to our playtesting 
capacity via, for instance, procedural personas (see Section 5.7.1.3 i whereas run¬ 
time RL-based player modeling offers dynamicity to the model with respect to time. 
Runtime player modeling further adds on the capacity of the model to adapt to the 
particular characteristies of the player, thereby increasing the degree of personaliza- 
tion. We can think of models of players, for instance, that are continuously tailored 
to the current player by using the player’s in-game annotations, behavioral deci- 
sions, or even physiological responses during the game. 

This way of modeling players is stili in its infancy with only a few studies existent 
on player behavioral modeling in educational games 14881 . in roguelike adventure 
games 026811267ll via TD learning or evolutionary reinforcement learning, and in 
first-person shooter games 06851 via inverse RL. However, the application of RL 
for modeling users beyond games has been quite active for the purposes of model¬ 
ing web-usage data and interactions on the web 06840 or modeling user simulations 
in dialog systems 011411225115951 . Normally in such systems a statistical model is 
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first trained on a corpus of human-computer interaction data for simulating (imitat- 
ing) User behavior. Then reinforcement learning is used to tailor the model towards 
an optimal dialog strategy which can be found through trial and error interactions 
between the user and the simulated user. 


5.6.3 Unsupervised Learning 

The aim of unsupervised learning (see Chapter]^ is to derive a model given a 
number of observations. Unlike in supervised learning, there is no specihed target 
output. Unlike in reinforcement learning there is no reward signal. (In short, there 
is no training signal of any kind.) In unsupervised learning, the signal is hidden in- 
ternally in the interconnections among data attributes. So far, unsupervised learning 
has mainly been applied to two core player modeling tasks: clustering behaviors 
and mining associations between player attributes. While Chapter]^ provides the 
general description of these unsupervised learning algorithms, in this section we 
focus on their specihc application for modeling players. 


5.6.3.1 Clustering 


As discussed in Chapter]^ clustering is a form unsupervised learning aiming to 
find clusters in datasets so that data within a cluster is similar to each other and 
dissimilar to data in other clusters. When it comes to the analysis of user behavior 
in games, clustering offers a means for reducing the dimensionality of a dataset, 
thereby yielding a manageable number of critical features that represent user behav¬ 
ior. Relevant data for clustering in games include player behavior, navigation pat- 
terns, assets bought, items used, game genres played and so on. Clustering can be 
used to group players into archetypical playing patterns in an effort to evaluate how 
people play a particular game and as part of a user-oriented testing process 0761. 
Further, one of the key questions of user testing in games is whether people play the 
game as intended. Clustering can be utilized to derive a number of different playing 
or behavioral styles directly addressing this question. Arguably, the key challenge 
in successfully applying clustering in games is that the derived clusters should have 
an intelligible meaning with respect to the game in question. Thus, clusters should 
be clearly interpretable and labeled in a language that is meaningful to the involved 
stakeholders (such as designers, artists and managers) 0176lll78ll . In the case studies 


of Section 5.7.1 we meet the above challenges and demonstrate the use of clustering 
in the popular Tomb Raider: Underworld (EIDOS Interactive, 2008) game. 
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5.6.3.2 Frequent Pattern Mining 

In Chapter we defined frequent pattern mining as the set of problems and tech- 
niques related to finding patterns and structures in data. Patterns include sequences 
and itemsets. Both frequent itemset mining (e.g., Apriori fh)) and frequent se- 
quence mining (e.g., GSP II652I ') are relevant and useful for player modeling. The 
key motivation for applying frequent pattern mining on game data is to hnd inherent 
and hidden regularities in the data. In that regard, key player modeling problems, 
such as player type identihcation and detection of player behavior patterns, can be 
viewed as frequent pattern mining problems. Frequent pattern mining can for ex- 
ample be used to to discover what game content is often purchased together—e.g., 
players that buy X tend to buy Y too—or what are the subsequent actions after dying 
in a level—e.g., players that die often in the tutorial level pick up more health packs 
inlevel 1 lfT20ll^ . 


5.7 What Can We Model? 

As already outlined at the beginning of this chapter, modeling of users in games can 
be classihed into two main tasks: modeling of the behavior of players and modeling 
of their experience. It is important to remember that modeling of player behavior 
is (mostly) a task of objective nature whereas the modeling of player experience is 
subjective given the idiosyncratic nature of playing experience. The examples we 
present in the remainder of this chapter highlight the various uses of AI for modeling 
players. 


5.7.1 Player Behavior 

In this section, we exemplify player behavior modeling via three representative use 
cases. The two first examples are based on a series of studies on player modeling by 
Drachen et al. in 2009 11761 and later on by Mahlmann et al. in 2010 141411 in the 
Tomb Raider: Underworld (EIDOS Interactive, 2008) game. The analysis includes 
both the clustering of players 117611 and the prediction 141411 of their behavior, 
which make it an ideal case study for the purposes of this book. The third study 
presented in this section focuses on the use of play traces for the procedural creation 
of player models. That case study explores the creation of procedural personas in 
the MiniDungeons puzzle roguelike game. 










5.7. What Can Wfe Model? 


239 



Fig. 5.9 A screenshot from the Tomb Raider: Underworld (EIDOS interactive, 2008) game fea- 
turing the player character, Lara Croft. The game is used as one of the player behavior modeling 
case studies presented in this book. Image obtained from Wikipedia (fair use). 


5.7.1.1 Clustering Players in Tomb Raider: Underworld 


Tomb Raider: Underworld (TRU) is a third-person perspective, advanced platform- 
puzzle game, where the player has to combine strategic thinking in planning the 
3D-movements of Lara Croft (the game’s player character) and problem solving in 
order to go through a series of puzzles and navigate through a number of levels (see 


Fig. 5.91. 


The dataset used for this study includes entries from 25,240 players. The 1,365 
of those that completed the game were selected and used for the analysis presented 
below. Note that TRU consists of seven main levels plus a tutorial level. Six features 
of gameplay behavior were extracted from the data and are as follows; number of 
deaths by opponents, number of deaths by the environment (e.g., fire, traps, etc.), 
number of deaths by falling (e.g., from ledges), total number of deaths, game com- 
pletion time, and the times help was requested. All six features were calculated on 
the basis of completed TRU games. The selection of these particular features was 
based on the core game design of the TRU game and their potential impact on the 
process of distinguishing among dissimilar patterns of play. 

Three different clustering techniques were applied to the task of identifying the 
number of meaningful and interpretable clusters of players in the data: k-means, 
hierarchical clustering and self-organizing maps. While the first two have been cov- 
ered in Chapterj^we will briefly outline the third method here. 

A self-organizing map (SOM) 03471 creates and iteratively adjusts a low- 
dimensional projection of the input space via vector quantization. In particular, a 
type of large SOM called an emergent self-organizing map 07271 was used in con- 
junction with reliable visualization techniques to help us identify clusters. A SOM 
consists of neurons organized in a low-dimensional grid. Each neuron in the grid 
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(map) is connected to the input vector through a connection weight vector. In ad- 
dition to the input vector, the neurons are connected to neighbor neurons of the 
map through neighborhood interconnections which generate the structure of the 
map; rectangular and hexagonal lattices organized in a two-dimensional sheet or a 
three-dimensional toroid shape are some of the most popular topologies used. SOM 
training can be viewed as a vector quantization algorithm which resembles the 
k-means algorithm. What differentiatos SOM, however, is the update of the topo- 
logical neighbors of the best-matching neuron—a best-matching neuron is a neuron 
for which there exists at least one input vector for which the Euclidean distance to 
the weight vector of this neuron is minimal. As a resuit, the whole neuron neighbor¬ 
hood is stretched towards the presented input vector. The outcome of SOM training 
is that neighboring neurons have similar weight vectors which can be used for pro- 
jecting the input data to the two-dimensional space and thereby clustering a set of 
data through observation on a 2D plane. For a more detailed description of SOMs, 
the reader is referred to B347I . 

To get some insight into the possible number of clusters existent in the data, k- 
means was applied for all k values less than or equal to 20. The sum of the Euclidean 
distances between each player instance and its corresponding cluster centroid (i.e., 
quantization error) is calculated for all 20 trials of k-means. The analysis reveals 
that the percent decrease of the mean quantization error due to the increase of k is 
notably high when k — 3 and k = 4. For k — 3 and k = 4 this value equals 19% 
and 13% respectively while it lies between 7% and 2% for k > 4. Thus, the k-means 
clustering analysis provides the hrst indication of the existence of three or four main 
player behavioral clusters within the data. 

As an altemative approach to k-means, hierarchical clustering is also applied to 
the dataset. This approach seeks to build a hierarchy of clusters existent in the data. 
The Ward’s clustering method 17471 is used to specify the clusters in the data by 
which the squared Euclidean distance is used as a measure of dissimilarity between 
data vector pairs. The resulting dendrogram is depicted in Fig. |5.10(a)| As noted 
in Chapter|^a dendrogram is a treelike diagram that illustrates the merging of data 
into clusters as a resuit of hierarchical clustering. It consists of many U-shaped 
lines connecting the clusters. The height of each U represents the squared Euclidean 
distance between the two clusters being connected. Depending on where the data 
analyst sets the squared Euclidean distance threshold a dissimilar number of clusters 
can be observed. 

Both k-means and hierarchical clustering already demonstrate that the 1,365 
players can be clustered in a low number of different player types. K-means in- 
dicates there are for three or four clusters while the Ward’s dendrogram reveals the 
existence of two populated and two smaller clusters, respectively, in the middle and 
at the edges of the tree. 

Applying SOM, as the third altemative approach, allows us to cluster the TRU 
data by observation on a two-dimensional plane. The U-matrix depicted in Fig. 


two-dimensional map. The average distance value between each neuron’s weight 
vector and the weight vectors of its immediate neighbors corresponds to the height 


5.10(b) is a visualization of the local distance structure in the data placed onto a 
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(a) Dendrogram of TRU data using Ward hierarchical clustering. A squared Euclidean distance of 
4.5 (illustrated with a horizontal black line) reveals four clusters. Image adapted from fml . 




(b) U-matrix visualization of a self-organizing map depicting the four player clusters identified in a 
population of 1,365 TRU players (shown as small colored squares). Different square colors depict 
different player clusters. Valleys represent clusters whereas mountains represent cluster borders. 
Image adopted from fTTel . 


Fig. 5.10 Detecting player types from TRU data using hierarchical (a) and SOM (b) clustering 
methods. 


of that neuron in the U-matrix (positioned at the map coordinates of the neuron). 
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Thus, U-matrix values are large in areas where no or few data points reside, creating 
mountain ranges for cluster boundaries. On the other hand, visualized valleys indi¬ 
cate clusters of data as small U-matrix values are observed in areas where the data 
space distances of neurons are small. 

The SOM analysis reveals four main classes of behavior (player types) as de- 
picted in Fig. 5.10(b) The different colors of the U-matrix correspond to the four 
different clusters of players. In particular, cluster 1 (8.68% of the TRU players) 
corresponds to players that die very few times, their death is caused mainly by the 
environment, they do not request help from the game frequently and they complete 
the game very quickly. Given such game skills these players were labeled as Vet¬ 
erans. Cluster 2 (22.12%) corresponds to players that die quite often (mainly due 
to falling), they take a long time to complete the game—indicating a slow-moving, 
careful style of play—and prefer to solve most puzzles in the game by themselves. 
Players of this cluster were labeled as Solvers, because they excel particularly at 
solving the puzzles of the game. Players of cluster 3 form the largest group of TRU 
players (46.18%) and are labeled as Pacifists as they die primarily from active op- 
ponents. Finally, the group of players corresponding to cluster 4 (16.56% of TRU 
players), namely the Runners, is characterized by low completion times and frequent 
deaths by opponents and the environment. 

The results showcase how clustering of player behavior can be useful to evalu- 
ate game designs. Specifically, TRU players seem to not merely follow a specific 
strategy to complete the game but rather they fully explore the affordances provided 
by the game in dissimilar ways. The findings are directly applicable to TRU game 
design as clustering provides answers to the critical question of whetherpeople play 
the game as intended. The main limitation of clustering, however, is that the derived 
clusters are not intuitively interpretable and that clusters need to be represented into 
meaningful behavioral patterns to be useful for game design. Collaborations be- 
tween the data analyst and the designers of the game—as was performed in this 
study—is essential for meaningful interpretations of the derived clusters. The ben¬ 
efit of such a collaboration is both the enhancement of game design features and 
the effective phenomenological debugging of the game 11761 . In other words, we 
make sure both that no feature of the game is undemsed or misused and that the 
playing experience and the game balance are debugged. 


5.7.1.2 Predicting Player Behavior in Tomb Raider: Underworld 

Building on the same set of TRU player data, a second study examined the possi- 
bilities of predicting particular aspects of playing behavior via supervised leaming 
nm. An aspect of player behavior that is particularly important for game design 
is to predict when a player will stop playing. As one of the perennial challenges of 
game design is to ensure that as many different types of players are facilitated in the 
design, being able to predict when players will stop playing a game is of interest 
because it assists with locating potentially problematic aspects of game design. Fur- 
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ther, such information helps toward the redesign of a game’s monetization strategy 
for maximizing user retention. 

Data was drawn from the Square Enix Europe Metrics Suite and was collected 
during a two month period (Ist Dec 2008 - 31st Jan 2009), providing records from 
approximately 203,000 players. Eor the player behavior prediction task it was de- 
cided to extract a subsample of 10,000 players which provides a large enough and 
representative dataset for the aims of the study, while at the same time is manage- 
able in terms of computational effort. A careful data-preprocessing approach yielded 
6,430 players that were considered for the prediction task—these players had com- 
pleted the first level of the game. 

As in the TRU cluster analysis the features extracted from the data relate to the 
core mechanics of the game. In addition to the six features investigated in the cluster- 
ing study of TRU the extracted features for this study include the number of times 
the adrenalin feature of the game was used, the number of rewards collected, the 
number of treasures found, and the number of times the player changes settings in 
the game (including player ammunition, enemy hit points, player hit points, and re- 
covery time when performing platform jumps). Further details about these features 
can be found in 04141 . 

To test the possibility of predicting the TRU level the player completed last a 
number of classification algorithms are tested on the data using the Weka machine 
learning Software 02431 . The approach followed was to experiment with at least one 
algorithm from each of the algorithm families existent in Weka and to put additional 
effort on those classification algorithms that were included in a recent list of the most 
important algorithms in data mining; decision tree induction, backpropagation and 
simple regression 07590 . The resulting set of algorithms chosen for classification 
are as follows: logistic regression, multi-layer perceptron backpropagation, variants 
of decision trees, Bayesian networks, and support vector machines. In the following 
section, we only outline the most interesting results from those reported in 04141 . For 
all tested algorithms, the reported classification prediction accuracy was achieved 
through 10-fold cross validation. 

Most of the tested algorithms had similar levels of performance, and were able 
to predict when a player will stop playing substantially better than the baseline. In 
particular, considering only gameplay of level 1 classification algorithms reach an 
accuracy between 45% and 48%, which is substantially higher than the baseline per¬ 
formance (39.8%). When using additional features from level 2, the predictions are 
much more accurate—between 50% and 78%—compared to the baseline (45.3%). 
In particular, decision trees and logistic regression manage to reach accuracies of 
almost 78% in predicting on what level a player will stop playing. The difference in 
the predictive strength of using level 1 and 2 data as compared to only level 1 data 
is partly due to increased amount of features used in the latter case. 

Beyond accuracy an important feature of machine learning algorithms is their 
transparency and their expressiveness. The models are more useful to a data analyst 
and a game designer if they can be expressed in a form which is easy to visualize and 
comprehend. Decision trees—of the form constructed by the ID3 algorithm 115441 
and its many derivatives—are excellent from this perspective, especially if pruned to 
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Fig. 5.11 A decision tree trained by the ID3 algorithm Eia to predict when TRU players will 
stop playing the game. The leaves of the tree (ovals) indicate the number of the level (2, 3 or 7) 
the player is expected to complete. Note that the TRU game has seven levels in total. The tree is 
constrained to tree depth 2 and achieves a classiflcation accuracy of 76.7%. 


a small size. For instance, the extremely small decision tree depicted in Fig. 5.11 


constrained to tree depth 2 and was derived on the set of players who completed both 
levels 1 and 2 with a classiflcation accuracy of 76.7%. The predictive capacity of 
the decision tree illustrated in Fig. |5.1 l| is impressive given how extremely simple it 
is. The fact that we can predict the flnal played level—with a high accuracy—based 
only on the amount of time spent in the room named Flush Tunnel of level 2 and 
the total rewards collected in level 2 is very appealing for game design. What this 
decision tree indicates is that the amount of time players spend within a given area 
early in the game and how well they perform are important for determining if they 
continue playing the game. Time spent on a task or in an area of the game can indeed 
be indicative of challenges with progressing through the game, which can resuit in 
a frustrating experience. 


5.7.1.3 Procedural Personas in MiniDungeons 

Procedural personas are generative models of player behavior, meaning that they 
can replicate in-game behavior and be used for playing games in the same role 
as players; additionally, procedural personas are meant to represent archetypical 
players rather than individual players 026811267112691 . A procedural persona can 
be defined as the parameters of a utility vector that describe the preferences of a 
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player. For example, a player might allocate different weight to finishing a game 
fast, exploring dialog options, getting a high score, etc.; these preferences can be 
numerically encoded in a utility vector where each parameter corresponds to the 
persona’s interest in a paiticular activity or outcome. Once these Utilities are de- 
fined, reinforcement leaming via TD learning or neuroevolution can be used to find 
a policy that reflects these Utilities, or a tree search algorithm such as MCTS can be 
used with these Utilities as evaluation functions. Approaches similar to the proce- 
dural persona concept have also been used for modeling the learning process of the 
player in educational games via reinforcement learning 04881 . 

As outlined in Section 5.4.1 MiniDungeons is a simple rogue-like game which 
features turn-based discrete movement, deterministic mechanics and full informa- 
tion. The player avatar must reach the exit of each level to win it. Monsters block 
the way and can be destroyed, at the cost of decreasing the player character’s health; 
health can be restored by collecting potions. Additionally, treasures are distributed 
throughout levels. In many levels, potions and treasures are placed behind monsters, 
and monsters block the shortest path from the entrance to the exit. Like many games, 
it is therefore possible to play MiniDungeons with different goals, such as reaching 
the exit in the shortest possible time, collecting as much treasure as possible or 
killing all the monsters (see Figs. |5.3| and |5.12| i. 

These different playing styles can be formalized as procedural personas by at- 
taching differing Utilities to measures such as the number of treasures collected, the 
number of monsters killed or the number of turns taken to reach the exit. Q-leaming 
can be used to leam policies that implement the appropriate persona in single lev¬ 
els 0268L and evolutionary algorithms can be used to train neural networks that 
implement a procedural persona across multiple levels 02671 . These personas can 
be compared with the play traces of human players by placing the persona in ev- 
ery situation that the human encountered in the play trace and comparing the action 
chosen by the procedural persona with the action chosen by the human (as you 
are comparing human actions with those of a Q function, you might say that you 
are asking “what would Q do?”). It is also possible to leam the utility values for a 
procedural persona clone of a particular human player by evolutionary search for 
those values that make the persona best match a particular play trace (see Fig. 5.12 1 . 
However, it appears that these “dones” of human players generalize less well than 
designer-specified personas 


5 . 7.2 Player Experience 

The modeling of player experience involves learning a set of target outputs that ap¬ 
proximate the experience (as opposed of the behavior) of the player. By definition, 
that which is being modeled (experience) is of subjective nature and the model¬ 
ing therefore requires target outputs that somehow approximate the ground truth of 
experience. A model of player experience predicts some aspect of the experience a 
player would have in some game situation, and leaming such models is naturally a 
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(a) An artificial neural network mapping between a game state 
(input) and pians (output). The input contains bine circles rep- 
resenting distances to various important elements of the game 
and a red circle representing hit points of the player character. 
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(b) A level of MiniDungeons 2 
depicting the current state of the 


game. 


Fig. 5.12 A example of a procedural persona: In this example we evolve the weights of artificial 
neural networks—an ANN per persona. The ANN takes observations of the player character and 
its environment and uses these to choose from a selection of possible pians. During evolution the 
utility function of each persona is used as the fitness function for adjusting its network weights. 
Each individual of each generation is evaluated by simulating a full game. A utility function allows 
us to evolve a network to pursue multiple goals across a range of different situations. The method 
depends on the designer providing carefully chosen observations, appropriate planning algorithms, 
and well-constructed utility functions. In this example the player opts to move towards a safe 
treasure. This is illustrated with a green output neuron and highlighted corresponding weights (a) 
and a green path on the game level (b). 


supervised learning problems. As mentioned, there are many ways this can be done, 
with approaches to player experience modeling varying regarding the inputs (from 
what the experience is predicted, e.g., physiology, level design parameters, playing 
style or game speed), the outputs (what sort of experience is predicted, e.g., fun, 
frustration, attention or immersion) and the modeling methodology. 

In this section we will outline a few examples of supervised learning for model¬ 
ing the experience of players. To best cover the material (methods, algorithms, uses 
of models) we rely on studies which have been thoroughly examined in the litera- 
ture. In particular, in the remainder of this section we outline the various approaches 
and extensive methodology for modeling experience in two games: a variant of the 
popular Super Mario Bros (Nintendo, 1985) game and a 3D prey-predator game 
named Maze-Ball. 


5.7.2.1 Modeling Player Experience in Super Mario Bros 

Our first example builds upon the work of Pedersen et al. 1521115201 who modified 
an open-source clone of the classic platform game Super Mario Bros (Nintendo, 
1985) to allow for personalized level generation. That work is important in that it 
set the foundations for the development of the experience-driven procedural content 
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generation framework 078311 which constitutes a core research trend within proce- 
dural content generation (see also Chapter]^. 

The game used in this example is a modified version of Markus Persson’s Infi¬ 
nite Mario Bros which is a public domain clone of Nintendo’s classic platform game 
Super Mario Bros (Nintendo, 1985). Ali experiments reported in this example rely 
on a model-free approach for the modeling of player experience. Models of player 
experience are based on both the level played (game context) and the player’s play- 
ing style. While playing, the game recorded a number of behavioral metrics of the 
players, such as the frequency of jumping, running and shooting, that are taken into 
consideration for modeling player experience. Further, in a follow-up experiment 
ll6T0l . the videos of players playing the game were also recorded and used to extract 
a number of useful visual cues such as the average head movement during play. The 
output (ground truth of experience) for all experiments is provided as first-person, 
rank-based reports obtained via crowdsourcing. Data was crowdsourced from hun- 
dreds of players, who played parrs of levels with different level parameters (e.g., 
dissimilar numbers, sizes and placements of gaps) and were asked to rank which 
of two levels best induced a number of player States. Across the several studies of 
player experience modeling for this variant of Super Mario Bros (Nintendo, 1985) 
collectively players have been asked to annotate fun, engagement, challenge, frus- 
tration, predictability, anxiety, and boredom. These are the target outputs the player 
model needs to predict based on the input parameters discussed above. 

Given the rank-based nature of the annotations the use of preference learning is 
necessary for the constrTiction of the player model. The collected data is used to train 
artificial neural networks that predict the players’ experiential States, given a player’s 
behavior (and/or affective manifestations) and a particular game context, using evo- 
lutionary preference learning. In neuroevolutionary preference learning ll763l . a 
genetic algorithm evolves an artificial neural network so that its output matches the 
pairwise preferences in the data set. The input of the artificial neural network is 
a set of features that have been extracted from the data set—as mentioned earlier, 
the input may include gameplay and/or objective data in this example. It is worth 
noting that automatic feature selection is applied to pick the set of features (model 
input) that are relevant for predicting variant aspects of player experience. The ge¬ 
netic algorithm implemented uses a fitness function that measures the difference 
between the reported preferences and the relative magnitude of the model output. 
Neuroevolutionary preference learning has been used broadly in the player model¬ 
ing literature and the interested reader may refer to the following studies (among 
many): ll432l IMOll76^ 15211 l520ll772l . 

The crowdsourcing experiment of Pedersen et al. EmiMi resulted in data 
(gameplay and subjective reports of experience) from 181 players. The best pre- 
dictor of reported fun reached an accuracy of around 70% on unseen subjective 
reports of fun. The input of the neural network/Mn-model is obtained through au¬ 
tomatic feature selection consists of the time Mario spent moving left, the number 
of opponents Mario killed from stomping, and the percentage of level played in 
the left direction. All three playing features appear to contribute positively for the 
prediction of reported/wn in the game. The best-performing model for challenge 
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Fig. 5.13 Facial feature tracking for head movement. Image adapted from IMol . 


prediction had an accuracy of approximately 78%. It is more complex than the best 
fun predictor, using five features: time Mario spends standing stili, jump difficulty, 
coin blocks Mario pressed, number of cannonballs Mario killed, and Mario kills by 
stomping. Finally, the best predictor for frustration reaches an accuracy of 89%. It 
is indeed an impressive finding that a player experience model can predict (with 
near-certainty) whether the player is frustrated by the current game by merely cal- 
culating the time Mario spent standing stili, the time Mario spent on its last life, the 
jump difficulty, and the deaths Mario had from falling in gaps. The general findings 
of Pedersen et al. 152011 suggest that good predictors for experience can be found 
if a preference learning approach is applied on crowdsourced reports of experience 
and gameplay data. The prediction accuracies, however, depend on the complexity 
of the reported state—arguably/Mn is a much more complicated and fuzzier notion 
to report compared to challenge or fmstration. In a follow up study by Pedersen 
et al. 15211 the additional player States of predictability, anxiety and boredom were 
predicted with accuracies of approximately 77%, 70% and 61%, respectively. The 
same player experience methodology was tested on an even larger scale, soliciting 
data from a total number of 780 players of the game I621L Frequent pattern mining 
algorithms were applied to the data to derive frequent sequences of player actions. 
Using sequential features of gameplay the models reach accuracies of up to 84% for 
engagement, 83% for fmstration and 79% for challenge. 

In addition to the behavioral characteristics the visual cues of the player can be 
taken into account as objective input to the player model. In 16101 visual features 
were extracted from videos of 58 players, both throughout whole game sessions and 
during small periods of critical events such as when a player completes a level or 
when the player loses a life (see Figs. 5.13 and 5.14[ ). The visual cues enhance the 
quality of the Information we have about a player’s affective state which, in turn, 
allows us to better approximate player experience. Specifically, fusing the game¬ 
play and the visual reaction features as inputs to the artificial neural network we 
achieve average accuracies of up to 84%, 86% and 84% for predicting reported en¬ 
gagement, fmstration and challenge, respectively. The key findings of 16101 suggest 
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(a) Winning 


(b) Losing 



(c) Experiencing challenge (d) Experiencing challenge 

Fig. 5.14 Examples of facial expressions of Super Mario Bros (Nintendo, 1985) players for differ¬ 
ent game States. AU images are retrieved from the Platformer Experience Dataset (Se). 


that players’ visual reactions can provide a rich source of information for modeling 
experience preferences and lead to more accurate models of player experience. 


5.1.2.2 Modeling Player Experience in Maze-Ball 


Our second example for modeling player experience builds largely upon the exten¬ 
sive studies of Martmez et al. 143411430114351 who analyzed player experience us- 
ing a simple 3D prey-predator game named Maze-Ball towards achieving affective- 
driven camera control in games. While the game is rather simple, the work on Maze- 
Ball offers a thorough analysis of player experience via a set of sophisticated tech- 
niques for capturing the psychophysiological patterns of players including prefer- 
ence learning, frequent pattern mining and deep convolutional neural networks. In 
addition the dataset that resulted from these studies is publicly available for further 
experimentation and forms a number of suggested exercises for this book. 


Maze-Ball is a three-dimensional prey-predator game (see Fig. 5.15 1 ; similar to 
a 3D version of Pac-Man (Namco,1981). The player (prey) Controls a ball which 
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(a) Maze-Ball (b) Space-Maze 

Fig. 5.15 Early Maze-Ball prototype (a) and a polished variant of the game (b) that 
features real-time camera adaptation 1^ . The games can be found and played at 
http://www.hectorpmartinez.com/. 


moves inside a maze hunted by 10 opponents (predators) moving around the maze. 
The goal of the player is to maximize her score by gathering as many tokens, scat- 
tered in the maze, as possible while avoiding being touched by the opponents in a 
predefined time window of 90 seconds. A detailed description of Maze-Ball game- 
play can be found in M780I . 

Gameplay attributes and physiological signals (skin conductance and heart rate 
variability) were acquired from 36 players of Maze-Ball. Each subject played a pre¬ 
defined set of eight games for 90 seconds, thus the total number of game sessions 
available is 288. Gameplay and physiological signals define the input of the player 
experience model. To obtain the ground truth of experience players self-reported 
their preference about game pairs they played using a rank-based questionnaire, in 
particular a 4-alternative forced choice (4-AFC) protocol 17801 . They were asked to 
rank the two games of each game pair with respect to fun, challenge, boredom,frus- 
tration, excitement, anxiety and relaxation. These annotations are the target outputs 
the player model will attempt to predict based on the input parameters discussed 
above. 

Several features were extracted from the gameplay and physiological signals ob- 
tained. These included features related to kills and deaths in the game as well as 
features associated with the coverage of the level. For extracting features from the 
physiological signals the study considered their average, Standard deviation, and 
maximum and minimum values. The complete feature list and the experimental pro¬ 
tocol followed can be found in 17801 . 

As in the example with the game variant of Super Mario Bros (Nintendo, 1985) 
the rank-based nature of the annotations requires the use of preference learning for 
the construction of the player experience model. Thus, the collected data is used to 
train neural networks that predict the player States, given a player’s gameplay be- 
havior and its physiological manifestations, using evolutionary preference learning. 
The architecture of the neural network can be either shallow using a simple multi- 
layered perceptron or deep (using convolutional neural networks Il430ll435l l. Figure 


5.16 shows the different ways information from gameplay and physiology can been 
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Fig. 5.16 Three dissimilar approaches to deep multimodal fusion via convolutional neural net- 
works. Gameplay events are fused with skin conductance in this example. The networks illustrated 
present two layers with one neuron each. The first convolutional layer receives as input a contin- 
uous signal at a high time resolution, which is further reduced by a pooling layer. The resulting 
signal (feature map) presents a lower time resolution. The second convolutional layer can combine 
this feature map with additional modalities at the same low resolution. In the convolution fusion 
network (left figure), the two events are introduced at this level as a pulse signal. In the pooling 
fusion network (middle figure), the events are introduced as paif of the first pooling layer, resulting 
in a filtered feature map. Finally, in the training fusion network (right figure), the events affect 
the training process of the first convolutional layer, leading to an altemative feature map. Image 
adapted from ES). 


fused on a deep convolutional neural network which is trained via preference 
learning to predict player experience in any game experience dataset that contains 
discrete in-game events and continuous signals (e.g., the player’s skin conductance). 

Predictors of the player experience States can reach accuracies that vary from 
72% for challenge up to 86% for frustration using a shallow multi-layer percep- 
tron player model II780I . Significant improvements are observed in those accuracies 
when the input space of the model is augmented with frequent sequences of in-game 
and physiological events (i.e., fusion on the input space). As in 16211 . Martinez et 
al. used GSP to extract those frequent patterns that were subsequently used as in- 
puts of the player model 14341 . Further accuracy improvements can be observed 
when physiology is fused with physiological signals on deep architectures of con¬ 
volutional neural networks iia. Using deep fusion (see Fig. |5.16| i accuracies of 
predicting player experience may surpass 82% for all player experience States con- 
sidered. Further information about the results obtained in the Maze-Ball game can 
be found in the following studies: 07801 14341 14301 l435l 14361 . 
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5.8 Further Reading 

For an extensive reading on game and player analytics (including visualization, 
data preprocessing, data modeling and game domain-dependent tasks) we refer the 
reader to the edited book by El-Nasr et al. 01861 . When it comes to player modeling 
two papers offer complementary perspectives and taxonomies of player modeling 
and a thorough discussion on what aspects of players can be modeled and the ways 
players can be modeled: the survey papers of Smith et al. i36l and Yannakakis et 

al. fim . 


5.9 Exercises 

In this section we propose a set of exercises for modeling both the behavior and the 
experience of game players. For that purpose, we outline a number of datasets that 
can be used directly for analysis. Please note, however, that the book’s website will 
remain up to date with more datasets and corresponding exercises beyond the ones 
covered below. 


5.9.1 Player Behavior 

A proposed semester-long game data mining project is as follows. You have to 
choose a dataset containing player behavioral attributes to apply the necessary pre¬ 
processing on the data such as extracting features and selecting features. Then you 
must apply a relevant unsupervised learning technique for compressing, analyz- 
ing, or reducing the dimensionality of your dataset. Based on the outcome of unsu¬ 
pervised learning you will need to implement a number of appropriate supervised 
learning techniques that learn to predict a data attribute (or a set of attributes). We 
leave the selection of algorithms to the reader or the course instructor. Below we dis- 
cuss a number of example datasets one might wish to start from; the reader, however, 
may refer to the book’s website for more options on game data mining projects. 


5.9.1.1 SteamSpy Dataset 

Steam^y (http://steamspy.com/) is a rich dataset of thousands of games released on 
SteairH containing several attributes each. While strictly not a dataset focused on 
player modeling, SteamSpy offers an accessible and large dataset for game analyt¬ 
ics. The data attributes of each game include the game’s name, the developer, the 
publisher, the score rank of the game based on user reviews, the number of owners 


^ http://store.steampowered.com/ 
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of the game on Steam, the people that have played this game since 2009, the people 
that have played this game in the last two weeks, the average and median playtime, 
the game’s price and the game’s tags. The reader may use an AP^to download ali 
data attributes from all games contained in the dataset. Then one might wish to apply 
supervised leaming to be able to predict an attribute (e.g., the game’s price) based 
on other game features such as the game’s score, release date and tags. Or altema- 
tively, one might wish to construet a score predictor of a new game. The selection 
of the modeling task and the AI methods is left to the reader. 


5.9.1.2 StarCraft: Brood War Repository 

The StarCraft: Brood War repository contains a number of datasets that include 
thousands of professional StarCraft replays. The various data mining papers, datasets 
as well as replay websites, crawlers, packages and analyzers have been compiled by 
Alberto Uriarte at Drexel Universityljln this exercise you are faced with the chal- 
lenge of mining game replays with the aim to predict a player’s strategy. Some 
results on the StarCraft: Brood War datasets can be found in 075011728115701 among 
others. 


5.9.2 Player Experience 

As a semester project on player experience modeling it is suggested you choose 
a game, one or more affective or cognitive States to model (modeTs output) and 
one or more input modalities. You are expected to collect empirical data using your 
selected game and build models for the selected psychological state of the players 
that rely on the chosen input modalities. 

As a smaller project that does not involve data collection you may opt to choose 
one of the following datasets and implement a number of AI methods that will de- 
rive accurate player experience models. The models should be compared in terms of 
a performance measure. The two datasets accompanying this book and outlined be- 
low are the platformer experience dataset and the Maze-Ball dataset. The book’s 
website will be up to date with more datasets and exercises beyond the ones covered 
below. 


5.9.2.1 Platformer Experience Dataset 

The extensive analysis of player experience in Super Mario Bros (Nintendo, 1985) 
and our wish to fuither advance our knowledge and understanding on player expe- 


^ http://steamspy.com/api.php 

® Available at: http://nova.wolfwork.com/dataMining.html 
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rience had led to the constmction of the Platformer Experience Dataset 0261 . This 
is the first open-access game experience corpus that contains multiple modalities of 
data from players of Infinite Mario Bros, a variant of Super Mario Bros (Nintendo, 
1985). The open-access database can be used to capture aspects of player experi¬ 
ence based on behavioral and visual recordings of platform game players. In addi- 
tion, the database contains aspects of the game context —such as level attributes— 
demographic data of the players and self-reported annotations of experience in two 
forms: ratings and ranks. 

Here are a number of questions you might wish to consider when attempting to 
build player experience models that are as accurate as possible: Which AI methods 
should I use? How should I treat my output values? Which feature extraction and 
selection mechanism should I consider? The detailed description of the dataset can 
be found here; http;//www.game.edu.mt/PED/. The book website contains further 
details and a set of exercises based on this dataset. 


5.9.2.2 Maze-Ball Dataset 

As in the case of the Platformer Experience Dataset the Maze-Ball dataset is also 
publicly available for further experimentation. This open-access game experience 
corpus contains two modalities of data obtained from Maze-Ball players; their 
gameplay attributes and three physiological signals; blood volume pulse, heait rate 
and skin conductance. In addition, the database contains aspects of the game such as 
features of the Virtual camera placement. Einally the dataset contains demographic 
data of the players and self-reported annotations of experience in two forms; ratings 
and ranks. 

The aim, once again, is to construet the most accurate models of experience for 
the players of Maze-Ball. So, which modalities of input will you consider? Which 
annotations are more reliable for predicting player experience? How will your sig¬ 
nals be processed? These are only a few of the possible questions you will en- 
counter during your efforts. The detailed description of the dataset can be found 
here; http;//www.hectorpmartinez.com/. The book website contains further details 
about this dataset. 


5.10 Summary 

This chapter focused on the use of AI for modeling players. The core reasons why 
AI should be used for that purpose is either to derive something about the players’ 
experience (how they feel in a game) or for us to understand something about their 
behavior (what they do in a game). In general we can model player behavior and 
player experience by following a top-down or a bottom-up approach (or a mix of 
the two). Top-down (or model-based) approaches have the advantage of solid the- 
oretical frameworks usually derived from other disciplines or other domains than 
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games. Bottom-up (or model-free) instead rely on data from players and have the 
advantage of not assuming anything about players other than that player experience 
and behavior are associated with data traces left by the player and that these data 
traces are representative of the phenomenon we wish to explain. While a hybrid 
between model-based and model-free approaches is in many ways a desirable ap- 
proach to player modeling, we focus on bottom-up approaches, where we provide 
a detailed taxonomy for the options available regarding the input and the output of 
the model, and the modeling mechanism per se. The chapter ends with a number 
of player modeling examples, for modeling both the behavior of players and their 
experience. 

The player modeling chapter is the last chapter of the second part of this book, 
which covered the core uses of AI in games. The next chapter introduces the third 
and last part of the book, which focuses on the holistic synthesis of the various AI 
areas, the various methods and the various users of games under a common game 
AI framework. 



Part III 
The Road Ahead 
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Chapter 6 

Game AI Panorama 


This chapter attempts to give a high-level overview of the field of game AI, with 
paiticular reference to how the different core research areas within this held inform 
and interact with each other, both actually and potentially. For that purpose we first 
identify the main research areas and their sub-areas within the game AI held. We 
then view and analyze the areas from three key perspectives: (1) the dominant AI 
method(s) used under each area; (2) the relation of each area with respect to the end 
(human) user; and (3) the placement of each area within a human-computer (player- 
game) interaction perspective. In addition, for each of these areas we consider how 
it could inform or interact with each of the other areas; in those cases where we hnd 
that meaningful interaction either exists or is possible, we describe the character of 
that interaction and provide references to published studies, if any. 

The main motivations for us writing this chapter is to help the reader understand 
how a particular area relates to other areas within this increasingly growing held, 
how the reader can beneht from knowledge created in other areas and how the reader 
can make her own research more relevant to other areas. To facilitate and foster 
synergies across active research areas we place all key studies into a taxonomy with 
the hope of developing a common understanding and vocabulary within the held of 
AI and games. The structure of this chapter is based on the hrst holistic overview of 
the game AI held presented in II785II . The book takes a new perspective on the key 
game AI areas given its educational and research focus. 

The main game AI areas and core subareas already identihed in this book and 
covered in this chapter are as follows: 

• Play Games (see Chapter]^ which includes the subareas of Playing to Win and 
Playing for the Experience. Independently of the purpose (winning or experi- 
ence) AI can control either the player character or the non-player character. 

• Generate Content (see Chapter]^ which includes the subareas of autonomous 
(procedural) content generation and assisted content generation. Please note that 
the terms assisted (procedural) content gene ration and mixed-initiative (proce¬ 
dural) content gene ration (as dehned in Chapter]^ are used interchangeably in 
this chapter. 


© Springer International Publishing AG, part of Springer Nature 2018 
G. N. Yannakakis and J. Togelius, Artificial Intelligence and 
Games, https://doi.org/10.1007/978-3-319-63519-4_6 
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• Model Players (see Chapter]^ which includes the subareas of player experience 

modeling and player behavior modeling, or else, game data mining II178L 

The scope of this chapter is not to provide an inclusive survey of all game AI 
areas—the details of each area have been covered in preceding chapters of the 
book—but rather a roadmap of interconnections between them via representative 
examples. As research progresses in this field, new research questions will pop up 
and new methods be invented, and other questions and methods recede in impor- 
tance. We believe that all taxonomies of research fields are by necessity tentative. 
Consequently, the list of areas defined in this chapter should not be regarded as fixed 
and final. 


The structure of the chapter is as follows: In Section 6.1 we start by holistically 
analyzing the game AI areas within the game AI field and we provide three alter- 
native views over game AI: one with respect to the methods used, one with respect 
to the end users within game research and development and one where we outline 
how each of the research areas fits within the player-game interaction loop of digital 


games. Then, Section 6.2 digs deeper into the research areas and describes each 


one of them in detail. With the subsection describing each area, there is a short de- 
scription of the area and a paragraph on the possible interactions with each of the 
other areas for which we have been able to identify strong or weak influences . The 
chapter ends with a section containing our key conclusions and vision for the future 
of the field. 


6.1 Panoramic Views of Game AI 


Analyzing any research field as a composition of various subareas with intercon¬ 
nections and interdependencies can be achieved in several different ways. In this 
section we view game AI research from three high-level perspectives that focus on 
the computer (i.e., the AI methods), the human (i.e., the potential end user of game 
AI) and the interaction between the key end user (i.e., player) and the game. In- 


stead in Section 6.2 we outline each game AI area and present the interconnections 
between the areas. 

Game AI is composed of (a set of) methods, processes and algorithms in artificial 
intelligence as those are applied to, or inform the development of, games. Naturally, 
game AI can be analyzed through the method used by identifying the dominant AI 
approaches under each game AI area (see Section |6.1.1| ). Alternatively, game AI 
can be viewed from the game domain perspective with a focus on the end users 
of each game AI area (see Section 6.1.2| l. Finally game AI is, by nature, realized 
through Systems that entail rich human-computer interaction (i.e., games) and, thus, 
the different areas can be mapped to the interaction framework between the player 
and the game (see Section[6.1.3|l. 
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Table 6.1 Dominant (•) and secondary (o) AI methods for each of the core AI areas we cover in 
this book. The total number of methods used for each area appears at the bottom row of the table. 



Play 

Winning 

Games 

Experience 

Generate Ct 
Autonomously 

mtent 

Assisted 

Model P 
Experience 

layers 

Behavior 

Behavior Authoring 

• 

• 





Tree Search 

• 

O 

O 

O 



Evolutionary Computation 

• 

o 

• 

• 

• 


Supervised Learning 

O 

• 



• 

• 

Reinforcement Learning 

• 

o 





Unsupervised Learning 




o 

O 

• 

Total (Dominant) 

5(4) 

5(2) 

2(1) 

3(1) 

3(2) 

2(2) 


6.1.1 Methods (Computer) Perspective 


The first panoramic view of game AI we present is centered around the AI meth¬ 
ods used in the field. As the basis of this analysis we first list the core AI methods 
mostly used in the game AI field. The key methodology areas identified in Chapter 
1^ include ad-hoc behavior authoring, tree search, evolutionary computation, rein- 
forcement learning, supervised learning, and unsupervised learning. For each of the 
game AI areas investigated we have identified the AI methods that are dominant 
or secondary in the area. While the dominant methods represent the most popular 
techniques used in the literature, secondary methods represent techniques that have 
been considered from a substantiai volume of studies but are not dominant. 

We have chosen to group methods according to what we perceive as a received 
taxonomy and following the structure of Chapter]^ While it would certainiy be pos- 
sible to classify the various methods differently, we argue that the proposed classi- 
fication is compact (containing solely key methodology areas) and it follows Stan¬ 
dard method classifications in AI. While this taxonomy is commonly accepted, the 
lines can be blurTed. In particular evolutionary computation, being a very general 
optimization method, can be used to perform supervised, unsupervised or reinforce- 
ment learning (more or less proficiently). The model-building aspect of reinforce- 
ment learning can be seen as a supervised learning problem (mapping from action 
sequences to rewards), and the commonly used tree search method Monte Carlo tree 
search can be seen as a form of TD learning. The resuit of any tree search algorithm 
can be seen as a plan, though it is often not guaranteed to lead to the desired end 
state. That the various methods have important commonalities and some overlap 
does not detract from the fact that each of them is clearly defined. 


Table 6.1 illustrates the relationship between game AI areas and correspond- 


ing methods. It is evident that evolutionary computation and supervised learning 
appear to be of dominant or secondary use in most game AI areas. Evolutionary 
computation is a dominant method for playing to win, for generating content (in an 
assisted/mixed-initiative fashion or autonomously), and for modeling players; it has 
also been considered for the design of believable play (play for experience) research. 
Supervised learning is of substantiai use across the game AI areas and appears to be 
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dominant in player experience and behavioral modeling, as well as in the area of AI 
that plays for experience. Behavior authoring, on the other hand, is useful solely for 
game-playing. Reinforcement learning and unsupervised leaming find limited use 
across the game AI areas, respectively, being dominant oniy on AI that plays to win 
and player behavior modeling. Finally, tree search hnds use primarily in playing 
to win and it is also considered—as a form of planning—for controlling play for 
experience and in computational narrative (as part of autonomous or assisted PCG). 

Viewing Table 6.1 from the game AI areas’ perspective (columns) it seems that 
AI that plays games (either for wining or for the experience) dehnes the game AI 
area with the most diverse and richest palette of AI methods. On the contrary, proce- 
dural content generation is solely dominated by evolutionary computation and tree 
search to a secondary degree. It is important to state that the popularity of any AI 
method within a particular area is closely tied to the task performed or the goal in 
mind. For example, evolutionary computation is largely regarded as a computation- 
ally heavy process which is mostly used in tasks associated with offline training. As 
PCG so far mainly relies on content that is generated offline, evolutionary compu¬ 
tation offers a good candidate method and the core approach behind search-based 
PCG 17201 . If Online learning is a requirement for the task at hand, however, other 
methods (such as reinforcement learning or pruned tree-search) tend to be preferred. 

Clearly the possibility space for future implementations of AI methods under 
particular game AI areas seems rather large. While particular methods have been 
traditionally dominant in specific areas for good reasons (e.g., planning in compu¬ 
tational narrative) there are equally good reasons to believe that the research in a 
game AI area itself has been heavily influenced by (and limited to) its correspond- 
ing dominant AI methods. The empty cells of Table |6.l] indicate potential areas for 
exploration and offer us an alternative view of promising new intersections between 
game AI areas and methods. 


6.1.2 End User (Humati) Perspective 

The second panoramic view of the game AI held puts an emphasis on the end user 
of the AI technology or general outcome (product or solution). Towards that aim we 
investigate three core dimensions of the game AI held and classify all game AI areas 
with respect to the process AI follows, the game context under which algorithms 
operate and, hnally, the end user that benehts most from the resulting outcome. The 
classes identihed under the above dimensions are used as the basis of the taxonomy 
we propose. 

The hrst dimension (phrased as a question) refers to the AI process: In general, 
what can AI do within games? We identify two potential classes in this dimension: 
AI can model or generate. For instance, an artihcial neural network can model 
a playing pattern, or a genetic algorithm can generate game assets. Given that AI 
can model or generate the second dimension refers to the context: What can AI 
methods model or generate in a game? The two possible classes here are content 
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DO (Process) WHAT (Context] FOR WHOM (End User) 


GAME AI AREA 



Fig. 6.1 The end user perspective of the identified game AI areas. Each AI area follows a process 
(model or generate) under a context (content or behavior) for a particular end user (designer, 
player, AI researcher or game producer/publisher). Blue and red arrows represent the processes of 
modeling and generation, respectively. Modified graph from l785l . 


and behavior. For example, AI can model a players’ affective state, or generate a 
level. Finally, the third dimension is the end user: AI can model, or generate, either 
content or behavior; but, for whom ? The classes under the third dimension are the 
designer, the player, the AI researcher, and the producer/publisher. 

Note that the above taxonomy serves as a framework for classifying the game 
AI areas according to the end user and is, by no means, inclusive of ali potential 
processes, contexts, and end users. For instance, one could claim that the producer’s 
role should be distinet from the publisher’ s role and that a developer should also be 
included in that class. Moreover, game content could be further split into smaller 
sub-classes such as narrative, levels, etc. Nevertheless, the proposed taxonomy pro¬ 
vides distinet roles for the AI process (model vs. generate vs. evaluate), clear-cut 
classihcation for the context (content vs. behavior) and a high-level classihcation of 
the available stakeholders in game research and development (designer vs. player 
vs. AI researcher vs. producer/publisher). The taxonomy presented here is a modi¬ 
fied version of the one introduced in M785I and it does not consider evaluatiou as a 
process for AI since it is out of the primary scope of this book. 

Figure [6T| depicts the relationship between the game AI core areas, the subareas 
and the end users in game research and development. Assisted, or mixed-initiative, 
content generation is useful for the designer and entails ali possible combinations of 
processes and context as both content and behavior can be either modeled or gener- 
ated for the designer. Compared to the other stakeholders the player benefits directly 
from more game AI research areas. In particular the player and her experience are 
affected by research on player modeling, which results from the modeling of ex¬ 
perience and behavior; research on autonomous procedural content generation, as 
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a resuit of generation of content; and studies on NPC playing (for wining or expe- 
rience) resulting from the generation of behavior. The player character (PC)-based 
game playing (for winning or experience) areas provide input to the AI researcher 
primarily. Finally, the game producer/publisher is primarily affected by results on 
behavioral player modeling, game analytics and game data mining as a resuit of 
behavior modeling. 


6.1.3 Player-Game Interaction Perspective 


The third and final panoramic perspective of game AI presented in this section 
couples the computational processes with the end user within a game and views 
all game AI areas through a human-computer interaction—or, more accurately, a 


player-game interaction—lens. The analysis builds on the hndings of Section 6.1.2 
and places the hve game AI areas that concern the player as an end user on a player- 
game interaction framework as depicted in Fig. |6.2| Putting an emphasis on player 
experience and behavior, player modeling directly focuses on the interaction be- 
tween a player and the game context. Game content is influenced primarily by re- 
search on autonomous procedural content generation. In addition to other types of 
content, most games feature NPCs, the behavior of which is controlled by some 
form of AI. NPC behavior is informed by research in NPCs that play the game to 
win or any other playing-experience purpose such as believability. 

Looking at the player-game interaction perspective of game AI it is obvious that 
the player modeling area has the most immediate and direct impact on the player 
experience as it is the only area linked to the player-game interaction directly. From 
the remaining areas, PCG influences player experience the most as all games have 
some form of environment representation and mechanics. Finally, AI that plays as 
an NPC (either to win or for the experience of play) is constrained to games that 
include agents or non-player characters. 

The areas not considered directly in this game AI perspective affect the player 
rather remotely. Research on AI tools that assist the generative process of content 
improves the game’s quality as a whole and in retrospect the player experience since 
designers tend to maintain a second-order player model 13781 while designing. Fi¬ 
nally, AI that plays the game as a player character can be offered for testing both the 
content and the NPC behaviors of a game, but also the interaction between the player 
and the game (via e.g., player experience competitions), but is mainly directed to AI 
researchers (see Fig. 1^. 


6.2 How Game AI Areas Inform Each Other 

In this section, we outline the core game AI areas and discuss how they inform or 
influence (the terms are used interchangeably) each other. All research areas could 
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Fig. 6.2 Game AI areas and sub-areas viewed from a player-game interaction perspective. 


be seen as potentially influencing each other to some degree; however, making a list 
of all such influences would be impractical and the resuit would be uninteresting. 
Therefore we only describe direct influences. Direct influences can be either strong 
(represented by a • as the bullet point style next to the corresponding influence in 
the following lists) or weak (represented by a o). We do not list influences we do 
not consider potentially important for the informed research area, or which only go 
through a third research area. 

The sections below list outgoing influence. Therefore, to know how area A in¬ 
fluences area B you should look in the section describing area A. Some influences 
are mutual, some not. The notation A —^ B in the headings of this section denotes 
that “A influences B”. In addition to the text description each section provides a fig- 
ure representing all outgoing influences of the area as arrows. Dark and light gray 
colored areas represent, respectively, strong and weak influence. Areas with white 
background are not influenced by the area under consideration. The figures also de- 
pict the incoming influences from other areas. Incoming strong and weak influences 
are represented, respectively, with a solid line and a dashed line around the game 
AI areas that influence the area under consideration. Note that the description of the 
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incoming influence from an area is presented in the corresponding section of that 
area. 


6.2.1 Play Games 

The key area in which AI plays a game (as covered in Chapter]^ involves the sub- 
areas of Playing to Win and Playing for Experience. As mentioned earlier in the 
chapter the AI can control either player or non-player characters of the game. We 
cover the influences to (and from) these subareas of game AI in this section. 


6.2.1.1 Playing to Win (as a Player or as a Non-Player) 

As already seen in Chapter [^research in AI that learns to play (and win) a game 
focuses on using reinforcement learning techniques such as temporal difference 
learning or evolutionary algorithms to learn policies/behaviors that play games 
well—whether it is a PC or an NPC playing the game. From the very beginning of AI 
research, reinforcement learning techniques have been applied to learn how to play 
board games (see for example SamueFs Checkers player II591I ). Basically, playing 
the game is seen as a reinforcement learning problem, with the reinforcement tied 
to some measure of success in the game (e.g., the score, or length of time survived). 
As with all reinforcement learning problems, different methods can be used to solve 
the problem (find a good policy) 07151 including TD learning 06891 . evolutionary 
computation 04061 . competitive co-evolution 0^ 15381 |5891l580l . simulated anneal- 
ing 1421 . other optimization algorithms and a large number of combinations between 
such algorithms 03391 . In recent years a large number of papers that describe the 
application of various learning methods to different types of video games have ap- 
peared in the literature (including several overviews 047011406116321 l457l ). Finally, 
using games to develop artificial general intelligence builds on the idea that games 
can be useful environments for algorithms to learn complex and useful behaviors; 
thus research in algorithms that learn to win is essential. 

It is also worth noting that most existing game-based benchmarks measure how 
well an agent plays a game—see for example 032211404115041 . Methods for learning 
to play a game are vital for such benchmarks, as the benchmarks are only meaningful 
in the context of the algorithms. When algorithms are developed that “beat” existing 
benchmarks, new benchmarks need to be developed. For example, the success of an 
early planning agent in the first Mario AI competition necessitated that the Software 
be augmented with a better level generator for the next competition 03221 . and for 
the Simulated Car Racing competition, the performance of the best agents on the 
original competition game spurred the change to a new more sophisticated racing 
game 0710113^ . 
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Fig. 6.3 Playing to Win: influence on (and from) other game AI research areas. Outgoing in- 
fluence (represented by arrows): black and dark gray colored areas reached by arrows represent, 
respectively, strong and weak influence. Incoming influence is represented by red lines around 
the areas that influence the area under investigation (i.e., AI that plays to win in this flgure): strong 
and weak influences are represented, respectively, by a solid and a dashed line. 


Research in this area impacts game AI at large as three game AI subareas are 
directly affected; in turn, one subarea is directly affecting AI that plays to win (see 
Fig.|^. 

• Playing to Win —Playing for the Experience: An agent cannot be believable 
or existent to augment the game’s experience if it is not proficient. Being able 
to play a game well is in several ways a precondition for playing games in a be¬ 
lievable manner though well playing agents can be developed without leaming 
(e.g., via top-down approaches). In recent years, successful entries to competi- 
tions focused on believable agents, such as the 2K BotPrize and the Mario AI 
Championship Turing test track, have included a healthy dose of learning algo- 
rithms II719II603I . 

• Playing to Win Generate Content (Autonomously); Having an agent that 
is capable of playing a game proficiently is useful for simulation-based testing 
in procedural content generation, i.e., the testing of newly generated game con¬ 
tent by playing through that content with an agent. For example, in a program 
generating levels for the platform game Super Mario Bros (Nintendo, 1985), the 
levels can be tested by allowing a trained agent to play them; those that the agent 
cannot complete can be discarded II335I . Browne’s Ludi system, which generates 
complete board games, evaluates these games through simulated playthrough and 
uses learning algorithms to adapt the strategy to each game Gl. 

• Playing to Win -a Generate Content (Assisted); Just as with autonomous pro¬ 
cedural content generation, many tools for Al-assisted game design rely on being 
able to simulate playthroughs of some aspect of the game. For instance, the Sen¬ 
tient Sketchbook tool for level design uses simple simulations of game-playing 
agents to evaluate aspects of levels as they are being edited by a human de- 
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signer B379I . Another example is the automated playtesting framework named 
Restricted Play 12951 which aims mostly at assisting designers on aspects of 
game balance during game design. A form of Restricted Play is featured in the 
Ludocore game engine im. 


6.2.1.2 Playing for the Experience (as a Player or as a Non-Player) 


Research on AI that plays a game for a purpose other than winning is Central to 
studies where playing the game well is not the primary research aim. AI can play 
the game as a player character attempting to maximize the believability value of 
play as, for instance, in Il719ll619ll9^ . It can alternatively play the game in a role of 
an NPC for the same purpose B268I . Work under this research subarea involves the 
study of believability, interestingness or playing experience in games and the inves- 
tigations of mechanisms for the construction of agent architectures that appear to 
have e.g., believable or human-like characteristics. The approaches for developing 
such architectures can be either top-down behavior authoring (such as the FAtiMA 
model used in My Dream Theatre IlOOI and the Mind Module model B191I used in 
The Pataphysic Institute) or bottom-up attempting to imitate believable gameplay 
from human players such as the early work of Thurau et al. in Quake II (Activision, 
1997) bots II696I . the human imitation attempts in Super Mario Bros (Nintendo, 
1985) 15111 . the Unreal Tournament 2004 (Epie Games, 2004) believable bots of 
Schrum et al. 16031 and the crowdsourcing studies of the Restaurant game II508I . 
Evidently, commercial games have for long benefited from agent believability re¬ 
search. Examples of this include popular games such as the Sims (Electronic Arts, 
2000) series. The industry puts a strong emphasis on the design of believability in 
games as this contributes to more immersive game environments. The funding of 
believability research through game AI competitions such as the 2K BotPrize is one 
of the many ciear indicators of the commercial value of agent believability. 

Over the last few years there has been a growing academic (and commercial) 
interest in the establishment of competitions that can be used as assessment tools 
for agent believability II719I . Agent believability research has provided input and 
given substance to those game benchmarks. A number of game Turing competitions 
have been introduced to the benefit of agent believability research, including the 
2K BotPrize on the Unreal Tournament 2004 (Epie Games, 2004) B647112641 game 
and the Mario AI Championship; Turing test track 16191 on the Super Mario Bros 
(Nintendo, 1985) game. Recently, the community saw AI agents passing the Turing 
test in the 2K BotPrize 16031 . 

The study of AI that plays games not for winning, but for other purposes, affects 
research on three other game AI areas as illustrated in Fig. 6.4 whereas it is affected 
by four other game AI areas. 


• Playing for the Experience ^ Model Players (Experience and Behavior): 

There is a direct link between player modeling and believable agents as research 
carried out for the modeling of human, human-like, and supposedly believable 
playing behavior can inform the construction of more appropriate models for 
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Experience Behavior 


Fig. 6.4 Playing for the Experience: influence on (and from) other game AI research areas. 


players. Examples include the imitation of human play styles in Super Mario 
Bros (Nintendo, 1985) 151II and Quake II (Activision, 1997) M696I . Though 
computational player modeling uses learning algorithms, it is only in some cases 
that it is the behavior of an NPC that is modeled. In particular, this is true when 
the in-game behavior of one or several players is modeled. This can be done using 
either reinforcement learning techniques, or supervised learning techniques such 
as backpropagation or decision trees. In either case, the intended outcome for the 
learning algorithm is not necessarily an NPC that plays as well as possible, but 
one that plays in the style of the modeled player II735II5111 . 

• Playing for the Experience —^ Generate Content (Autonomously): Believ- 
able characters may contribute to better levels 19^ , more believable stories 
I801ll40l1l531l and, generally, better game representations M563I . A typical ex- 
ample of the integration of characters in the narrative and the drive of the lat- 
ter based on the former includes the FAtiMa agents in FearNot! M516I and My 
Dream Theater cool. Another example is the generation of Super Mario Bros 
(Nintendo, 1985) levels that maximize the believability of any Mario player Il96l . 


6.2.2 Generate Content 

As covered in detail in Chapter|^AI can be used to design whole (or parts of) games 
in an autonomous or in an assisted fashion. This core game AI area includes the 
subareas of autonomous (procedural) content generation and assisted or mixed- 
initiative (procedural) content generation. The interactions of these subareas with 
the remaining areas of game AI are covered in this section. 
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Fig. 6.5 Generate Content (Autonomously): influence on (and from) other game AI research areas. 


6.2.2.1 Generate Content (Autonomously) 

As stated in Chapter [^procedural content generation has been included in limited 
roles in some commercial games since the early 1980s; however, recent years have 
seen an expansion of research on more controllable PCG for multiple types of game 
content GMl, using techniques such as evolutionary search 07201 and constraint 
solving 06381 . The influence of PCG research beyond games is already evident in 
areas such as computational creativity 03811 and interaction design (among others). 
There are several surveys of PCG available, including a recent book 06161 and vision 
paper 07020 . as well as surveys of frameworks 07831 . sub-areas of PCG 0554117321 
and methods 0720116381 . 

Autonomous content generation is one of the areas of recent academic research 
on AI in games which bears most promise for incorporation into commercial 
games. A number of recent games have been based heavily on PCG, including 
independent (“indie”) game production successes such as Spelunky (Mossmouth, 
2009) and Minecraft (Mojang, 2011), and mainstream AAA games such as Diablo 
III (Blizzard Entertainment, 2012) and Civilization V (2K Games, 2010). A notable 
example, as mentioned in Chapter|^is No Man’s Sky (Helio Games, 2016) with its 
quintillion different procedurally generated planets. Some games heavily based on 
PCG and developed by researchers have been released as commercial games on plat- 
forms such as Steam and Facebook; two good examples of this are Petalz 0565115661 
and Galactic Anns Race 02490 . 

Figure [63] depicts the three (and five) areas that are influenced by (and influence) 
autonomous PCG. 

• Generate Content (Autonomously) -a Play to Win: If an agent is trained to 

perform well in only a single game environment, it is easy to overspecialize the 
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training and arrive at a policy/behavior that will not generalize to other levels. 
Therefore, it is important to have a large number of environments available for 
training. PCG can help with this, potentially providing an infinite supply of test 
environments. For example, when training players for the Mario AI Champi- 
onship it is common practice to test each agent on a large set of freshly generated 
levels, to avoid overtraining 113221 . There has also been research on adapting 
NPC behavior specihcally to generated content M332L Finally, one approach to 
artiflcial general intelligence is to train agents to be good at playing games in 
general, and test them on a large variety of games drawn from some genre or dis- 
tribution. To avoid overhtting, this requires games to be generated automatically, 
a form of PCG II598II . The generation of new environments is very important 
for NPC behavior learning, and this extends to benchmarks that measure some 
aspect of NPC behavior. Apart from the Mario AI Championship, competitions 
such as the Simulated Car Racing Championship use freshly generated tracks, 
unseen by the participants, to test submitted controllers M102I . But there is also 
scope for benchmarks and competitions focused on measuring the capabilities of 
PCG Systems themselves, such as the Level Generation track of the Mario AI 
Championship 16201 . 

o Generate Content (Autonomously) —Play for the Experience: Research on 
autonomous PCG naturally influences research on agent (PC or NPC) control for 
believability, interestingness or other aims aside from winning given that these 
agents are performing in a particular environment and under a specific game 
context. This influence is stili in its infancy and the only study we can point the 
reader to is the one by Camilleri et al. 1961 where the impact of level design on 
player character believability is txwnmtA m Super Mario Bros (Nintendo, 1985). 
Further, research on Interactive narrative benehts from and influences the use of 
believable agents that interact with the player and are interwoven in the story 
plot. The narrative can yield more (or less) believability to agents and thus the 
relationship between the behavior of the agents and the emergent story is strong 
180111401115311 . In that sense, the computational narrative of a game may deflne 
the arena for believable agent design. 

• Generate Content (Autonomously) Generate Content (Assisted); As con¬ 
tent design is a Central part of game design, many Al-assisted design tools incor¬ 
porate some form of assisted content design. Examples include Tanagra, which 
helps designers create complete platform game levels which ensure playabil- 
ity through the use of constraint solvers 16411 . and SketchaWorld 16341 . An- 
other example is Sentient Sketchbook, which assists humans in designing strategy 
game levels through giving immediate feedback on properties of levels and au¬ 
tonomously suggesting modifications 13791 . 


6.2.2.2 Generate Content (Assisted) 

Assisted content generation refers to the development of Al-powered tools that sup- 
port the game design and development process. This is perhaps the AI research area 
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Fig. 6.6 Generate Content (Assisted): influence on (and from) other game AI research areas. 


which is most promising for the development of better games M764I . In particu- 
lar, AI can assist in the creation of game content varying from levels and maps to 
game mechanics and narratives. The impact of Al-enabled authoring tools on design 
and development influences the study of AI that plays games for believability, in- 
terestingness or player experience, and research in autonomous procedural content 
generation (see Fig. |6.6| i. Al-assisted game design tools range from those designed 
to assist with generation of complete game rulesets such as MetaGame 15221 or 
RuLearn M699I to those focused on more specific domains such as strategy game 
levels USD, platform game levels 16421 . horror games 13941 or physics-based puz- 
zles 16131 . 

It is worth noting that AI tools have been used extensively for supporting design 
and commercial game development. Examples such as the SpeedTree (Interactive 
Data Visualization Inc., 2013) generator for trees and other plants II287II have seen 
uses in several game productions. The mixed-initiative PCG tools mentioned above 
have a great potential in the near future as most of these are already tested on com¬ 
mercial games or developed with game industrial partners. Furthermore, there are 
tools designed for interactive modeling and analysis of game rules and mechanics, 
which are not focused on generating complete games but on prototyping and un- 
derstanding aspects of complex games; such systems could be applied to existing 
commercial games 16391 . 

o Generate Content (Assisted) Play for the Experience: Authoring tools in 
forms of open-world sandboxes could potentially be used for the creation of more 
believable behaviors. While this is largely stili an unexplored area of research 
and development, notable attempts include the NERO game AI platform where 
players can train game agents for efficient and believable first-person shooter 
bot behaviors II654I . An open version of this platform focused on crowdsourcing 
behaviors has been released recently M327I . A similar line of research is the gen- 
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eration of Super Mario Bros (Nintendo, 1985) players by means of Interactive 
evolution ll648L 

• Generate Content (Assisted) Generate Content (Autonomously); Re¬ 
search on methods of mixed-initiative co-creation 177411 and design can feed 
input to and spur discussion on Central topics in procedural content genera- 
tion. Given the importance of content design in the development process as 
a whole, any form of mixed-initiative AI assistance in the generation process 
can support and augment procedural content generation. Notable examples of 
mixed-initiative PCG include the Tanagra platform game level design AI assis¬ 
tant 16411 . and the SketchaWorld 16341 . the Sentient World 0801 . the Sentient 
Sketchbook 1379117741 and the Sonanda M394I systems which generate game 
maps and worlds in a mixed-initiative design fashion following different ap- 
proaches and levels of human computation. Further, tools can assist the author- 
ing of narrative in games. In particular, drama management tools have long 
been investigated within the game AI community. An academic example is ABL 
which has allowed the authoring of narrative in Fagade 0441L Among the few 
available and well-functional authoring tools the most notable is the Versu 11971 
storytelling system which was used in the game Blood & Laureis (Emily Short, 
2014) and the Inform 7 M480I Software package that led to the design of Mystery 
House Possessed (Emily Short, 2005). More story generation tools as such can 
be found at the http;//storygen.org/ repository, by Chris Martens and Rogelio E. 
Cardona-Rivera. 


6.2.3 Model Players 

As already explored in Chapter]^ modeling players involves the subtasks of mod- 
eling their behavior or their experience. Given the interwoven nature of these two 
tasks we present their influences to (and from) other game AI areas under one com- 
mon secti on. In player modeling II782116361 . computati onal models are created for 
detecting how the player perceives and reacts to gameplay. As stated in Chapter 
1^ such models are often built using machine learning methods where data con- 
sisting of some aspect of the game or player-game interaction is associated with 
labeis derived from some assessment of player experience, gathered for example 
from questionnaires II781I . However, the area of player modeling is also concemed 
with structuring observed player behavior even when no correlates to experience 
are available—e.g., for identifying player types or predicting player behavior. 

Research and development in player modeling can inform attempts for player ex¬ 
perience in commercial-standard games. Player experience detection methods and 
algorithms can advance the study of user experience in commercial games. In addi- 
tion, the appropriateness of sensor technology, the technical plausibility of biofeed- 
back sensors, and the suitability of various modalities of human input can inform in- 
dustrial developments. Quantitative testing via game metrics—vatying from behav- 
ioral data mining to in-depth low scale studies—is also improved 07641II781II861 . 
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Fig. 6.7 Model Players: influence on (and from) other game AI research areas. 


By now, a considerable number of academic studies use directly datasets from com- 
mercial games to induce models of players that could inform further development of 
the game. For example, we refer the reader to the experiments in clustering players 
of Tomb Raider: Underworld (Square Enix, 2008) into archetypes M176I and pre- 
dicting their late-game performance based on early-game behavior gT4l . Examples 
of player modeling components within high-profile commercial games include the 
arousal-driven appearance of NPCs in Left 4 Dead 2 (Valve Corporation, 2009), the 
fearful combat skills of the opponent NPCs in F.E.A.R. (Monolith, 2005), and the 
avatars’ emotion expression in the Sims series (Maxis, 2000) and Black and White 
(Lionhead Studios, 2001). A notable example of a game that is based on player expe¬ 
rience modeling is Nevermind (Elying Mollusk, 2016); the game adapts its content 
based on the stress of the player, which is manifested via a number of physiological 
sensors. 

Player modeling is considered to be one of the core non-traditional uses of AI 
in games 17641 and affects research in Al-assisted game design, believable agents, 
computational narrative and procedural content generation (see Pig. |6.7| i. 

• Model Players Play for the Experience: Player models can inform and up- 
date believable agent architectures. Models of behavioral, affective and cognitive 
aspects of gameplay can improve the human-likeness and believability of any 
agent controller—whether it is ad-hoc designed or built on data derived from 
gameplay. While the link between player modeling and believable agent design 
is obvious and direct, research efforts towards this integration within games are 
stili sparse. However, the few efforts made on the imitation of human game play- 
ing for the construction of believable architectures have resulted in successful 
outcomes. Eor example, human behavior imitation in platform 15111 and racing 
games 0735113071 has provided human-like and believable agents while similar 
approaches for developing Unreal Tournament 2004 (Epie Games, 2004) bots 
(e.g., in 03281 1 recently managed to pass the Turing test in the 2K BotPrize com- 
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petition. Notably, one of the two agents that passed the Turing test in 2K BotPrize 
managed to do so by imitating (mirroring) aspects of human play II535I . A line 
of Work that stands in between player modeling and playing games for the ex- 
perience is the study on proceduml personas 026811267112691 . As introduced in 
Chapterj^procedural personas are NPCs that are trained to imitate realistically 
the decision making process of humans during play. Their study both influences 
our understanding about the internal (cognitive) processes of playing behavior 
and advances our knowledge on how to build believable characters in games. 

• Model Players —^ Generate Content (Autonomously); There is an obvious 
link between computational models of players and PCG as player models can 
drive the generation of new personalized content for the player. The experience- 
driven role of PCG 07831 . as covered in Chapter|^ views game content as an 
indirect building block of a player’s affective, cognitive and behavioral state and 
proposes adaptive mechanisms for synthesizing personalized game experiences. 
The “core loop” of an experience-driven PCG solution involves leaming a model 
that can predict player experience, and then using this model as part of an evalua- 
tion function for evolving (or otherwise optimizing) game content; game content 
is evaluated based on how well it elicits a particular player experience, accord- 
ing to the model. Examples of PCG that are driven by player models include 
the generation of game rules 07161 . camera prohles 07801 [85l and platform game 
levels 06171 . Most Work that goes under the label “game adaptation” can be said 
to implement the experience-driven architecture; this includes work on adapting 
the game content to the player using reinforcement learning ll2^ or semantic 
constraint solving 03981 rather than evolution. Player models may also inform 
the generation of computational narrative. Predictive models of playing experi¬ 
ence can drive the generation of individualized scenarios in a game. Examples of 
the coupling between player modeling and computational narrative include the 
affect-driven narrative systems met in Fagade 04411 and FearNot! ESh . and the 
affect-centered game narratives such as the one of Final Fantasy VII (Square, 
1997). 

o Model Players —Generate Content (Assisted); User models can enhance au- 
thoring tools that, in turn, can assist the design process. The research area that 
bridges user modeling and Al-assisted design is in its infancy and only a few ex- 
ample studies can be identihed. Indicatively, designer models 03780 have been 
employed to personalize mixed-initiative design processes 0774113771I379I . Such 
models may drive the procedural generation of designer-tailored content. 


6.3 The Road Ahead 

This chapter has initially identihed the currently most active areas and subareas 
within game AI and placed them on three holistic frameworks; an AI method map- 
ping, a game stakeholder (end user) taxonomy and the player-game interaction loop. 
This analysis revealed dominant AI algorithms within particular areas as well as 
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room for exploratiori of new methods within areas. In addition, it revealed the dis- 
similar impact of different areas on different end users such as the AI researcher and 
the designer and, finally, outlined the influence of the different game AI areas on the 
player, the game and their interaction. From the high-level analysis of the game AI 
field we moved on to the detailed analysis of the game AI areas that compose it and 
thoroughly surveyed the meaningful interconnections between the different areas. 

The total number of strong and weak influences is rather small compared to all 
possible interconnections between the areas, which clearly signals the research ca- 
pacity of the game AI field for further explorations. We can distinguish a number of 
connections which are currently very active, meaning that much work currently goes 
on in one area that draws on work in another area. Here we see, for example, the 
connection between AI that plays to win in a general fashion in conjunction with the 
use of tree search algorithms; the MCTS algorithm was invented in the context of 
board game-playing, proved to be really useful in the general game playing compe- 
tition, and is being investigated for use in games as different as StarCraft (Blizzard 
Entertainment, 1998) and Super Mario Bros (Nintendo, 1985). Improvements and 
modifications to the algorithm have been flowing back and forth between the vari- 
ous areas. Another indicative connection that is alive and strong is between player 
modeling and procedural content generation, where it is now common for newly 
devised PCG algorithms and experimental studies to include player behavioral or 
player experience models. 

One can also study the currently strong areas by trying to cluster the trending 
topics in recent iterations of the IEEE CIG and AUDE conferences. Such studies 
always include some form of selection hias, as papers can usually be counted into 
more than one area (e.g., depending on if you group by method or domain), but if 
you start from the session groupings made by the program chairs of each conference 
you achieve at least some inter-subjective validity. According to such a clustering, 
the most active topics over the last few years have been player (or emotion) model¬ 
ing, game analytics, general game AI, real-time strategy game playing—especially 
StarCraft (Blizzard Entertainment, 1998)—and PCG (in general). Another perspec- 
tive of the trend in game AI research is the varying percentage of studies on NPC 
(or game agent) behavior learning over other uses of AI in games at the two key 
conferences in the field (IEEE CIG and AUDE). Our preliminary calculations sug- 
gest that while, initially, AI was mainly applied for NPC control and for playing 
board/card games well—more than 75% of CIG and AUDE papers were linked to 
NPC behavior and agent game playing in 2005—that trend has drastically changed 
as entirely new (non-traditional) uses of AI became more common over the years— 
e.g., roughly 52% of the papers in CIG and AUDE in 2011 did not involve game 
agents and NPC behavior. These facts indicate a shift in the use of AI in and for 
games towards multiple non-traditional applications—which tend to be traditional 
by now—for the development of better games II764L 

But it is maybe even more interesting to look at all those connections that are 
unexploited or underexploited or potentially strong. Eor example, player modeling 
is potentially very important in the development of AI that Controls believable, in¬ 
teresting or curious agents, but this has not been explored in enough depth yet; the 
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same holds for the applicatiori of user (or else, designer) modeling principies to- 
wards the personalization of Al-assisted game design. Believable agents have, in 
turn, not been used enough in content generation (either autonomous or assisted). 
A grand vision of game AI for the years to come is to let it identify its own role 
within game design and development as it sees fit. In the last chapter of this book 
we discuss frontier research topics as such and identify unexplored roles of AI in 
games. 


6.4 Summary 

We hope that with this chapter of this book, we have been able to give our readers 
a sense of how this—by now rather large and multifaceted—research field hangs 
together, and what could be done to integrate it further. We realize that this is oniy 
our view of its dynamics and interconnections, and that there are (or could be) many 
competing views. We look forward to seeing those in upcoming studies in the field. 

Finally, it is important to note that it would have been impossible to provide a 
complete survey of all the areas as, first, the game AI field is growing rapidly and, 
second, it is not the core objective of the book. This means that the bibliography 
is indicative rather than exhaustive and serves as a general guideline for the reader. 
The website of the book, instead of the book per se, will be kept up to date regarding 
important new readings for each area. 

The next and final chapter of the book is dedicated to a few long-standing, yet 
rather unexplored, research frontiers of game AI. We believe that any advances made 
in these directions will lead to scientific breakthroughs not merely within game AI 
but largely in both games (their design, technology and analysis) and AI per se. 
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Chapter 7 

Frontiers of Game AI Research 


In this final chapter of the book we discuss a number of long-term visionary goals 
of game AI, putting an emphasis on the generality of AI and the extensibility of its 
roles within games. In particular, in Section [7T| we discuss our vision for general 
behavior for each one of the three main uses of AI in games. Play needs to be- 
come general; generators are required to have general generative capacities across 
games, content types, designers and players; models of players also need to show- 
case general modeling abilities. In Section 7.2 we also discuss roles of AI that are 
stili unexplored and certainly worth investigating in the future. The book ends with 
a discussion dedicated to general ethical considerations of game AI (Section [73]l. 


7.1 General General Game AI 

As evidenced from the large volume of studies the game AI research area has been 
supported by an active and healthy research community for more than a decade— 
at least since the start of the IEEE CIG and the AUDE conference series in 2005. 
Before then, research had been conducted on AI in board games since the dawn 
of automatic computing. Initially, most of the work published at IEEE CIG or AI- 
IDE was concemed with leaming to play a particular game as well as possible, or 
using search/planning algorithms to play a game as well as possible without learn- 
ing. Gradually, a number of new applications for AI in games and for games in AI 
have come to complement the original focus on AI for playing games 17641 . Papers 
on procedural content generation, player modeling, game data mining, human-like 
playing behavior, automatic game testing and so on have become commonplace 
within the community. As we saw in the previous chapter there is also a recogni- 
tion that all these research endeavors depend on each other 17851 . However, almost 
all research projects in the game AI field are very specific. Most published papers 
describe a particular method—or a comparison of two or more methods—for per- 
forming a single task (playing, modeling, generating, etc.) in a single game. This is 
problematic in several ways, both for the scientific value and for the practical appli- 

© Springer International Publishing AG, part of Springer Nature 2018 279 

G. N. Yannakakis and J. Togelius, Artificial Intelligence and 
Games, https://doi.org/10.1007/978-3-319-63519-4_7 








280 


Chapter 7. Frontiers ofGame AI Research 


cability of the methods developed and studies made in the field. If an AI approach 
is only tested on a single task for a single game, how can we argue that is an ad- 
vance in the scientific study of artificial intelligence? And how can we argue that 
it is a useful method for a game designer or developer, who is likely working on a 
completely different game than the one the method was tested on? 

As discussed in several parts of this book general game playing is an area that 
has already been studied extensively and constitutes one of the key areas of game AI 
II785I . The focus of generality solely on play, however, is very narrow as the possible 
roles of AI and general intelligence in games are many, including game design, 
content design and player experience design. The richness of the cognitive skills 
and affective processes required to successfully complete these tasks has so far been 
largely ignored by game AI research. We thus argue, that while the focus on general 
AI needs to be retained, research on general game AI needs to expand beyond mere 
game playing. The new scope for general general game AI beyond game-playing 
broadens the applicability and capacity of AI algorithms and our understanding of 
intelligence as tested in a Creative domain that interweaves problem solving, art, and 
engineering. 

For general game AI to eventually be truly general, we argue that we need to 
extend the generality of general game playing to ali other ways in which AI is (or can 
be) applied to games. More specifically we argue that the field should move towards 
methods, systems and studies that incorporate three different types of generality: 

1 . Game generality. We should develop AI methods that work with not just one 
game, but with any game (within a given range) that the method is applied to. 

2. Task generality. We should develop methods that can do not only one task (play¬ 
ing, modeling, testing, etc) but a number of different, related tasks. 

3. User/designer/player generality. We should develop methods that can model, 
respond to and/or reproduce the very large variability among humans in design 
style, playing style, preferences and abilities. 

We further argue that all of this generality can be embodied into the concept of 
general game design, which can be thought of as a final frontier of AI research 
within games. Further details about the notion of general general game AI can be 
found in the vision paper we co-authored about this frontier research area 117 181 . It 
is important to note that we are not arguing that more focused investigations into 
methods for single tasks in single games are useless; these are often important as 
proofs-of-concept or industrial applications and they will continue to be important 
in the future, but there will be an increasing need to validate such case studies in a 
more general context. We are also not envisioning that everyone will suddenly start 
working on general methods. Rather, we are positing generalizations as a long-term 
goal for our entire research community. Finally, the general systems of game AI that 
we envision ought to have a real-world use. There is a risk that by making systems 
too general we might end up not finding applications of these general systems to any 
specific real-world problem. Thus, the system’s applicability (or usefulness) sets our 
core constraint towards this vision of general game AI. More specifically, we envi- 
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sion general general game AI systems that are neveitheless integrated successfully 
within specific game platforms or game engines. 


7.1.1 General Play 

The problem of playing games is the one that has been most generalized so far. There 
akeady exist at least three serious benchmarks or competitions attempting to pose 
the problem of playing games in general, each in its own imperfect way. The Gen¬ 
eral Game Playing Competition, often abbreviated GGP 12231 . the Arcade Leaming 
Environment BOl and the General Video Game AI competition II528I ; ali three have 
been discussed in various places in this book. The results from these competitions 
so far indicate that general purpose search and learning algorithms by far outper- 
form more domain-specific Solutions and “elever hacks”. Somewhat simplified, we 
can say that variations of Monte Carlo tree search perform best on GVGAI and 
GGP M202I . and for ALE (where no forward model is available so learning a policy 
for each game is necessary) reinforcement learning with deep networks ll464l and 
search-based iterative width 038911301113901 perform best. This is a very marked 
difference from the results of the game-specific competitions, indicating the lack of 
domain-independent Solutions. 

While these are each laudable initiatives and currently the focus of much re- 
search, in the future we will need to expand the scope of these competitions and 
benchmarks considerably, including expanding the range of games available to play 
and the conditions under which gameplay happens. We need game playing bench¬ 
marks and competitions capable of expressing any kind of game, including puzzle 
games, 2D arcade games, text adventures, 3D action-adventures and so on; this is the 
best way to test general AI capacities and reasoning skills. We also need a number of 
different ways of interfacing with these games—there is room for both benchmarks 
that give agents no information beyond the raw screen data but give them hours to 
learn how to play the game, and those that give agents access to a forward model 
and perhaps the game code itself, but expect them to play any game presented to 
them with no time to learn. These different modes test different AI capabilities and 
tend to privilege different types of algorithms. It is worth noting that the GVGAI 
competition is currently expanding to different types of playing modes, and has a 
long-term goal to include many more types of games 152711 . 

We also need to differentiate away from just measuring how to play games op- 
timally. In the past, several competitions have focused on agents that play games 
in a human-like manner; these competitions have been organized similarly to the 
classic Turing test 0263116191 . Playing games in a human-like manner is important 
for a number of reasons, such as being able to test levels and other game content 
as part of search-based generation, and to demonstrate new content to players. So 
far, the question of how to play games in a human-like manner in general is mostly 
unexplored; some preliminary work is reported in 03371 . Making progress here will 
likely involve modeling how humans play games in general, including characteris- 
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tics such as short-term memory, reaction time and perceptual capabilities, and then 
translating these characteristics to playing style in individual games. 


7.1.2 General Game Generation and Orchestration 

The study of PCG 16161 for the design of game levels has reached a certain matu- 
rity and is, by far, the most popular domain for the application of PCG algorithms 
and approaches (e.g., see II7201 17851 1783II among many). What is common in most 
of the content generation studies covered in this book, however, is their specificity 
and strong dependency of the representation chosen on the game genre examined. 
For the Mario AI Framework, for instance, the focus on a single level generation 
problem has been very much a mixed blessing; it has allowed for the proliferation 
and simple comparison of multiple approaches to solving the same problem, but has 
also led to a ciear overfitting of methods. Even though some limited generalization 
is expected within game levels of the same genre, the level generators that have been 
explored so far clearly do not have the capacity of general level design. We argue 
that there needs to be a shift in how level generation is viewed. The obvious change 
of perspective is to create general level generators —level generators with general 
intelligence that can generate levels for any game (within a specified range). That 
would mean that levels are generated successfully across game genres and players 
and that the output of the generation process is meaningful and playable as well as 
entertaining for the player. Further, a general level generator should be able to coor- 
dinate the generative process with the other computational game designers who are 
responsible for the other parts of the game design. 

To achieve general level design intelligence algorithms are required to capture 
as much of the level design space as possible at different representation resolutions. 
We can think of representation leaming approaches such as deep autoencoders II739II 
capturing core elements of the level design space and fusing various game genres 
within a sole representation—as already showcased by a few methods, such as the 
Deep Learning Novelty Explorer 0731 . The first attempt to create a benchmark for 
general level generation has recently been launched in the form of the Level Genera¬ 
tion Track of the GVGAI competition. In this competition track, competitors submit 
level generators capable of generating levels for unseen games. The generators are 
then supplied with the description of several games, and produce levels which are 
judged by human judges B338II . Initial results suggest that constructing competent 
level generators that can produce levels for any game is much more challenging 
than constructing competent level generators for a single game. A related effort is 
the Video Game Level Corpus II669I which aims to provide a set of game levels 
across multiple games and genres which can be used for training level generators 
for data-driven procedural content generation. 

While level generation, as discussed above, is one of the main examples of proce¬ 
dural content generation, there are many other aspects (or facets) of games that can 
be generated. These include visuals, such as textures and images; narrative, such 
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as quests and backstories; audio, such as sound effects and music; and of course 
all kinds of things that go into game levels, such as items, weapons, enemies and 
personalities 0381116160 . However, an even greater challenge is the generation of 
complete games, including some or all of these facets together with the rules of the 
game. While, as covered in Chapter]^ there have been several attempts to gener¬ 
ate games (including their rules) we are not aware of any approach to generating 
games that tries to generate more than two of the facets of games listed above. We 
are also not aware of any game generation system that even tries to generate games 
of more than one genre. Multi-faceted generation systems like Sonanda 0394113951 
co-generate horror game levels with corresponding soundscapes but do not cater to 
the generation of rules. It is ciear that the very domain-limited and facet-limited as- 
pects of current game generation systems resuit from intentionally limiting design 
choices in order to make the very difficult problem of generating complete games 
tractable. Yet, in order to move beyond what could be argued to be toy domains and 
start to fulhll the promise of game generation, we need systems that can generate 
multiple facets of games at the same time, and that can generate games of different 
kinds. 

This process has been dehned as facet (domain) orchestration in games 03711 
13240 . Orchestration refers to the process of harmonizing game generation. Evi- 
dently, orchestration is a necessary process when we consider the output of two 
or more content type generators—such as visuals and audio—for the generation of 
a complete game. Drawing inspiration from music, orchestration may vary from a 
top-down, conductor-driven process to a bottom-up, free-from generation process 
umi. A few years ago, something very much like general game generation and or¬ 
chestration was outlined as the challenges of “multi-content, multi-domain PCG” 
and “generating complete games ” Il702l . It is interesting to note that there has not 
seemingly been any attempt to create more general game generators since then, per- 
haps due to the complexity of the task. A recent study by Karavolos et al. lEU 
moves towards the orchestration direction as it fuses level and game design param- 
eters in first-person shooters via deep convolutional neural networks. The trained 
networks can be used to generate balanced games. Currently the only genre for 
which generators have been built that can generate high-quality (complete) games 
is abstract board games. Once more genres have been “conquered”, we hope that 
the task of building more general level generators can begin. 

Linked to the tasks of orchestration and general game generation there are im¬ 
portant questions with respect to the Creative capacity of the generation process that 
remain largely unanswered. For example, how Creative can a generator be and how 
can we assess it? Is it, for instance, deemed to have appreciation, skill, and imagina- 
tion 11301 ? When it comes to the evaluation of the Creative capacity of current PCG 
algorithms a case can be made that most of them possess only skill. Does the creator 
manage to explore novel combinations within a constrained space, thereby resulting 
in exploratory game design creativity ||5^ : or, is on the other hand trying to break 
existing boundaries and constraints within game design to come up with entirely 
new designs, demonstrating transformational creativity ll5^ ? If used in a mixed- 
initiative fashion, does it enhance the designer’s creativity by boosting the possi- 
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bility space for her? Arguably, the appropriateness of various evaluation methods 
for autonomous PCG creation or mixed-initiative co-creation II774I remains largely 
unexplored within both human and computational creativity research. 


7.1.3 General Game Affective Loop 

It stands to reason that general intelligence implies (and is tightly coupled with) 
general emotional intelligence 11443 1 . The ability to recognize human behavior and 
emotion is a complex yet critical task for human communication that acts as a fa- 
cilitator of general intelligence II157I . Throughout evolution, we have developed 
particular forms of advanced cognitive, emotive and social skills to address this 
challenge. Beyond these skills, we also have the capacity to detect affective pattems 
across people with different moods, cultural backgrounds and personalities. This 
generalization ability also extends, to a degree, across contexts and social settings. 
Despite their importance, the characteristics of social intelligence have not yet been 
transferred to AI in the form of general emotive, cognitive or behavioral models. 
While research in affective computing 15301 has reached important milestones such 
as the capacity for real-time emotion recognition II794I —which can be faster than 
humans under particular conditions—all key hndings suggest that any success of 
affective computing is heavily dependent on the domain, the task at hand, and the 
context in general. This specificity limitation is particularly evident in the domain 
of games II781II as most work in modeling player experience focuses on particu¬ 
lar games, under well-controlled conditions with particular, small sets of players 
(see II78311609116101 l4T5l among many). In this section we identify and discuss two 
core unexplored and interwoven aspects of modeling players that are both important 
and necessary steps towards the long-term aim of game AI to realize truly adaptive 
games. The hrst aspect is the closure of the affective loop in games; the second as- 
pect is the construction of general models capable of capturing experience across 
players and games. 

As stated at the start of this book, affective computing is best realized within 
games in what we name the game affective loop. While the phases of emotion elic- 
itation, affect modeling and affect expression have offered some robust Solutions by 
now, the very loop of affective-based interaction has not been closed yet. Aside from 
a few studies demonstrating some affect-enabled adaptation of the game II772116171 
the area remains largely unexplored. It is not only the complexity of modeling play¬ 
ers and their experience that is the main hurdle against any advancement. What is 
also far from trivial is the appropriate and meaningful integration of any of these 
models in a game. The questions of how often the system should adapt, what it 
should alter and by what degree are not easy to answer. As most of the questions are 
stili open to the research community the only way to move forward is to do more 
research in adaptive games involving affective aspects of the experience. Existing 
commercial-standard games that already realize the affective loop such as Never- 
mind (Flying Mollusk, 2016) are the ambassadors for further work in this area. 
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Once the game affective loop is successfully realized within particular games the 
next goal for game AI is the generality of affect-based interaction across games. 
The game affective loop should not only be operational; it should ideally be general 
too. For AI in games to be general beyond game-playing it needs to be able to recog- 
nize general emotional and cognitive-behavioral patterns. This is essentially AI that 
can detect context-free emotive and cognitive reactions and expressions across con- 
texts and builds general computational models of human behavior and experience 
which are grounded in a general gold Standard of human behavior. So far we have 
only seen a few proof-of-concept studies in this direction. Early work within the 
game AI field focused on the ad-hoc design of general metrics of player interest that 
were tested across different prey-predator games 076811767ll . In other, more recent, 
studies predictors of player experience were tested for their ability to capture player 
experience across dissimilar games 0431116121 l97l . Another study on deep multi- 
modal fusion can be seen as an embryo for further research in this direction 04351 . 
in which various modalities of player input such as player metrics, skin conductance 
and heart activity have been fused using stacked autoencoders. Discovering entirely 
new representations of player behavior and emotive manifestations across games, 
modalities of data, and player types is a first step towards achieving general player 
modeling. Such representations can, in turn, be used as the basis for approximating 
the ground truth of user experience in games. 


7.2 AI in Other Roles in Games 

The structure of this book reflects our belief that playing games, generating content 
and modeling players are the Central applications of AI methods in games. How- 
ever, there are many variants and use cases of game playing, player modeling or 
content generation that we have not had time to explore properly in the book, and 
which in some cases not have been explored in the literature at all. Further, there are 
some applications of AI in games that cannot be really classified as special cases 
of our “big three” AI applications in games, despite our best efforts. This section 
briefly sketches some of these applications, some of which may be important future 
research directions. 

Playtesting: One of the many use cases for AI for playing games is to test the 
games. Testing games for bugs, balancing player experience and behavior, and other 
issues is important in game development, and one of the areas where game devel- 
opers are looking for AI assistance. While playtesting is one of the AI capabilities 
within many of the mixed-initiative tools discussed in Chapterj^ there has also been 
work on Al-based playtesting outside of that context. For example, Denzinger et al. 
evolved action sequences to find exploits in sports games, with discouragingly good 
results OSI- For the particular case of finding bugs and exploits in games, one of 
the research challenges is to find a good and representative coverage of problems, 
so as to deliver an accurate picture to the development team of how many problems 
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there are and how easy they are to run into, and allow prioritization of which prob- 
lems to fix. 

Critiquing Games: Can AI methods meaningfully judge and critique games? Game 
criticism is hard and generally depends on deep understanding of not only games but 
also the surrounding cultural context. Stili, there might be automated metrics that are 
useful for game criticism, and can provide information to help reviewers, game cu- 
rators and others in selecting which games to consider for reviewing for inclusion 
in app Stores. The ANGELINA game generation system is one of the few examples 
towards this direction 01361 in which AI generates the overview of the game to be 
played. 

Hyper-formalist Game Studies: AI methods can be applied to corpora of games 
in order to understand distributions of game characteristics. For example, decision 
trees can be used to visualize patterns of resource systems in games 03120 . There 
are likely many other ways of using game AI for game studies that are stili to be 
discovered. 

Game Directing: The outstanding feature of Left 4 Dead (Valve Corporation, 2008) 
was its AI director, which adjusted the onslaught of zombies to provide a dramatic 
curve of challenge for players. While simple and literally one-dimensional (only a 
single dimension of player experience was tracked), the AI director proved highly 
effective. There is much room for creating more sophisticated AI directors; the 
experience-driven PCG framework 07831 is one potential way within which to work 
towards this. 

Creative Inspiration: While designing a complete game that actually works likely 
requires a very complex generator, it can be simpler to generate an idea for new 
games, that are then designed by humans. Creative ideation tools range from sim¬ 
ple word-recombination-based tools implemented as card games or Twitter bots, to 
elaborate computational creativity systems such as the What If-Machine imi. 

Chat Monitoring: In-game chats are important in many Online multi-player games, 
as they allow people to collaborate within games and socialize through them. Un- 
fortunately, such chats can also be used to threaten or abuse other players. Given 
the very large volume of chat messages sent through a successful Online game, it 
becomes impossible for the game developers to curate chats manually. In the efforts 
to combat toxic behavior, some game developers have therefore turned to machine 
learning. Notably, Riot Games have trained algorithms to recognize and remove 
toxic behavior in the MOBA League of Legends (Riot Games, 2009) M413L Even 
worse, sexual predation can be seen in some games, where pedophiles use game 
chats to reach children; there have been attempts to use machine learning to detect 
sexual predators in game chats too 12411 . 


AI-Based Game Design: Throughout most of the book, we have assumed the ex- 
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istence of a game or at least a game design, and discussed how AI can be used to 
play that game, generate content for it or model its players. However, one could also 
start from some AI method or capability and try to design a game that builds on that 
method or capability. This could be seen as an opportunity to showcase AI methods 
in the context of games, but it could also be seen as a way of advancing game de¬ 
sign. Most classic game designs originate in an era where there were few effective 
AI algorithms, there was little knowledge among game designers about those AI 
algorithms that existed, and CPU and memory capacity of horne computers was too 
limited to allow anything beyond simple heuristic AI and some best-first search to 
be used. One could even say that many classic video game designs are an attempt 
to design around the lack of AI—for example, the lack of good dialog AI for NPCs 
led to the use of dialog trees, the lack of Ais that could play FPS games believably 
and competently led to FPS game designs where most enemies are only on-screen 
for a few seconds so that you do not notice their lack of smarts, and the lack of 
level generation methods that guaranteed balance and playability led to game de¬ 
signs where levels did not need to be completable. The persistence of such design 
patterns may be responsible for the relatively low utilization of interesting AI meth¬ 
ods within commercial game development. By starting with the AI and designing 
a game around it, new design patterns that actually exploit some of the recent AI 
advances can be found. 

Several games have been developed within the game AI research community 
specifically to showcase AI capabilities, some of which have been discussed in this 
book. Three of the more prominent examples are based on Stanley et al.’s work 
on neuroevolution and the NEAT algorithm; NERO, which is an RTS-like game 
where the player trains an army through building a training environment rather 
than controlling it directly iH; Galactic Arms Race, in which weapons con- 
trolled through neural networks are indirectly collectively evolved by thousands of 
players 02501 124^ : and Petalz, which is a Facebook game about collecting flow- 
ers based on a similar idea of selection-based collective neuroevolution 0565115661 . 
Other games have been built to demonstrate various adaptation mechanisms, such 
as Infinite Tower Defense ||25| and Maze-Ball ll780l . Within interactive narrative it is 
relatively common to build games that showcase specific theories and methods; a fa- 
mous example is Fagade 14411 and another prominent example is Prom Week 04471 . 
Treanor et al. have attempted to identify Al-based game design patterns, and found 
a diverse array of roles in which AI can be or has been used in games, and a number 
of avenues for future Al-based game design 07240 . 


7.3 Ethical Considerations 

Like ali technologies, artificial intelligence, including game AI, can be used for 
many purposes, some of them nefarious. Perhaps even more importantly, technoT 
ogy can have ethically negative or at least questionable effects even when there is no 
malicious intent. The ethical effects of using AI with and in games are not always 













288 


Chapter 7 . Frontiers ofGame AI Research 


obvious, and the topic is not receiving the attention it should. This short section 
looks at some of the ways in which game AI intersects with ethical questions. For 
general AI research issues, ethics and values we refer the interested reader to the 
Asilomar AI Principle^ developed in conjunction with the 2017 Asilomar confer- 
ence. 

Player modeling is perhaps the part of game AI where the ethical questions are 
most direct, and perhaps most urgent. There is now a vigorous debate about the 
mass collection of data about us both by government entities (such as the US Na¬ 
tional Security Agency or the United Kingdom’s GCHQ) and private entities (such 
as Google, Amazon, Facebook and Microsoft) 164115021 . With methodological ad- 
vances in data mining, it is becoming possible to learn more and more about in- 
dividual people from their digital traces, including inferring sensitive information 
and predicting behavior. Given that player modeling involves large-scale data col¬ 
lection and mining, many of the same ethical challenges exist in player modeling as 
in the mining of data about humans in general. Mikkelsen et al. present an overview 
of ethical challenges for player modeling 14581 . Below we give some examples of 
such challenges. 

Privacy: It is becoming increasingly possible and even practicable to infer vari- 
ous real-life traits and properties of people from their in-game behavior. This can 
be done without the consent or even knowledge of the subject, and some of the in¬ 
formation can be of a private and sensitive nature. For example, Yee and colleagues 
investigated how player choices in World ofWarcraft (Blizzard Entertainment, 2004) 
correlated with the personalities of players. They used data about players’ characters 
from the Armory database of World of Warcraft (Blizzard Entertainment, 2004) and 
correlated this information with personality tests administered to players; multiple 
strong correlations were found 17881 . In a similar vein, a study investigated how 
players’ life motives correlated with their M/necra/f (Mojang, 2011) log hies IlOll . 
That research used the life motivation questionnaires of Steven Reiss, and found 
that players’ self-reported life motives (independence, family, etc.) were expressed 
in a multitude of ways inside constructed Minecraft (Mojang, 2011) worlds. Using 
a very different type of game strong correlations have been found between play- 
ing style in the hrst-person shooter Battlefield 3 (Electronic Arts, 2011) and player 
characteristics such as personality 06871 . age 06861 and nationality l4hl . It is en- 
tirely plausible that similar methods could be used to infer sexual preferences, po- 
litical views, health status and religious beliefs. Such information could be used by 
advertising networks to serve targeted ads, by criminals looking to blackmail the 
player, by Insurance companies looking to differentiate premiums, or by malevolent 
political regimes for various forms of suppression. We do not know yet what can be 
predicted and with what accuracy, but it is imperative that more research be done on 
this within the publicly available literature; it is ciear that this kind of research will 
also be carried out behind locked doors. 


* https://futureoflife.org/ai-principles/ 
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Ownership of Data: Some player data can be used to recreate aspects of the player’s 
behavior; this is the case for e.g., the Drivatars in Microsoft’s Forza Motorsport 
series, and more generally for agents created according to the procedural persona 
concept 02671 . It is currently not ciear who owns this data, and if the game devel- 
oper/publisher owns the data, what they can do with it. Will the game allow other 
people to play against a model of you, i.e., how you would have played the game? 
If so, can it identify you to other players as the origin of this data? Does it have to 
be faithful to the behavioral model of you, or can it add or distort aspects of your 
playing behavior? 

Adaptation: Much of the research within game AI is concerned with adaptation 
of games, with the experience-driven PCG framework being the perhaps most com¬ 
plete account on how to combine player modeling with procedural content gener- 
ation to create personalized game experiences 07831 . However, it is not ciear that 
it is always a good thing to adapt games to players. The “hlter bubble” is a con¬ 
cept within discussion of social networks which refers to the phenomenon where 
collaborative hltering ensures that users are only provided with content that is al- 
ready in line with their political, ethical, or aesthetic preferences, leading to a lack 
of healthy engagement with other perspectives. Excessive adaptation and personal- 
ization might have a similar effect, where players are funneled into a narrow set of 
game experiences. 

Stereotypes: Anytime we train a model using some dataset, we run the risk of 
reproducing stereotypes within that dataset. For example, it has been shown that 
Word embeddings trained on Standard datasets of the English language reproduce 
gender-based stereotypes 1^ . The same effects could be present when modeling 
player preferences and behavior, and the model might learn to reproduce prejudiced 
conceptions regarding gender, race, etc. Such problems can be exacerbated or ame- 
liorated by the tools made available to players for expressing themselves in-game. 
For example, Lim and Harrell have developed quantitative methods for measuring 
and addressing bias in character creation tools M386I . 


Censorship: Of course, it is entirely possible, and advisable, to use AI methods to 
promote ethical behavior and uphold ethical values. Earlier, Section 7.2 discussed 
the examples of AI for hltering player chats in Online multi-player games, and for 
detecting sexual predators. While such technologies are generally welcome, there 
are important ethical considerations in how they should be deployed. For example, 
a model that has been trained to recognize hate speech might also react to normal 
in-game jargon; setting the right decision threshold might involve a delicate tradeoff 
between ensuring a welcoming game environment and not restricting Communica¬ 
tions unduly. 


AI Beyond Games: Finally, a somewhat more far-fetched concern, but one we be- 
lieve stili merits discussion is the following. Games are frequently used to train and 
test AI algorithms—this is the main aim of, for example, the General Video Game 
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AI Competition and the Arcade Learning Environment. However, given how many 
games are focused on violent competition, does this mean that we focus unduly on 
the development of violence in artificial intelligence? What effects could this have 
on AI that is trained on games but employed in other domains, such as transport or 
health care? 


7.4 Summary 

In this last chapter of this book we went through directions that we view as criti- 
cal and important for the advancement of the game AI field. Initially we have ar- 
gued that the general intelligence capacity of machines needs to be both explored 
and exploited to its full potential (1) across the different tasks that exist within the 
game design and development process, including but absolutely no longer limited 
to game playing; (2) across different games within the game design space; and (3) 
across different users (players or designers) of AI. We claim that, thus far, we have 
underestimated the potential for general AI within games. We also claim that the 
currently dominant practice of only designing AI for a specific task within a specihc 
domain will eventually be detrimental to game AI research as algorithms, methods 
and epistemological procedures will remain specihc to the task at hand. As a resuit, 
we will not manage to push the boundaries of AI and exploit its full capacity for 
game design. We are inspired by the general game-playing paradigm and the recent 
successes of AI algorithms in that domain and suggest that we become less specihc 
about ali subareas of the game AI held including player modeling and game gener- 
ation. Doing so would allow us to detect and mimic different general cognitive and 
emotive skills of humans when designing games. It is worth noting, again, that we 
are not advocating that all research within the game AI held focuses on generality 
right now; studies on particular games and particular tasks are stili valuable, given 
how little we stili understand and can do. But over time, we predict that more and 
more research will focus on generality across tasks, games and users, because it is 
in the general problems that the interesting research questions of the future lie. It 
seems that we are not alone in seeing this need as other researchers have argued for 
the use of various game-related tasks (not just game playing) to be used in artihcial 
general intelligence research II799II . 

The path towards achieving general game artihcial intelligence is stili largely un- 
explored. For AI to become less specihc—yet remain relevant and useful for game 
design—we envision a number of immediate steps that could be taken; hrst and fore- 
most, the game AI community needs to adopt an open-source accessible strategy so 
that methods and algorithms developed across the different tasks are shared among 
researchers for the advancement of this research area. Venues such as the current 
game AI research portaQ could be expanded and used to host successful methods 
and algorithms. For the algorithms and methods to be of direct use particular tech- 


^ http://www.aigameresearch.org/ 
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nical specifications need to be established—e.g., such as those established within 
game-based AI benchmarks—which will maximize the interoperability among the 
various tools and elements submitted. Examples of benchmarked specifications for 
the purpose of general game AI research include th^eneral video game description 
language and the puzzle game engine PuzzleScriptnFinally, following the GVGAI 
competition paradigm, we envision a new set of competitions rewarding general 
player models, Al-assisted tools and game generation techniques. These competi¬ 
tions would further motivate researchers to work in this exciting research area and 
enrich the database of open-access interoperable methods and algorithms, directly 
contributing to the state of the art in computational general game design. 

Beyond generality we also put a focus on the extensibility of AI roles within 
games. In that regard, we outlined a number of AI roles that are underrepresented 
currently but neveitheless define very promising research frontiers for game AI. 
These include the roles of AI as playtester, game critic, game studies formalist, di¬ 
rector, Creative designer, and gameplay ethics judge. Further, we view the placement 
of AI at the very center of the design process (Al-based game design) as another 
critical research frontier. 

This chapter, and the book itself, concluded with a discussion on the ethical im- 
plications of whatever we do in game AI research. In particular, we discussed as- 
pects such as the privacy and ownership of data, the considerations about game 
adaptation, the emergence of stereotypes through computational models of players, 
the risks of AI acting as a censor, and hnally the ethical constraints imposed on AI 
by “unethical” aspects of the very nature of games. 


^ http://www.puzzlescript.net/ 
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